01.06.2013 Views

Statistical Methods in Medical Research 4ed

Statistical Methods in Medical Research 4ed

Statistical Methods in Medical Research 4ed

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

An alternative set of weights, which penalizes disagreements by the square of<br />

the number of categories of disagreement, is<br />

wi ˆ 1<br />

2 : …19:50†<br />

…k 1†<br />

Although k provides a measure of agreement with<strong>in</strong> a particular set of data<br />

its use as a more general measure is limited. This is because its value is<br />

dependent not only on the <strong>in</strong>tr<strong>in</strong>sic agreement of the two raters but also on the<br />

prevalence of the condition be<strong>in</strong>g assessed or, for multicategorical conditions,<br />

the distribution across the categories (Kraemer, 1979). Thus k is not a general<br />

measure of agreement between raters but only of their agreement <strong>in</strong> a particular<br />

situation.<br />

The problems discussed <strong>in</strong> the previous paragraph are analysed <strong>in</strong> detail by<br />

Cicchetti and Fe<strong>in</strong>ste<strong>in</strong> (1990) and Fe<strong>in</strong>ste<strong>in</strong> and Cicchetti (1990). They proposed<br />

that, just as agreement between a diagnostic test and disease status would not<br />

normally be summarized <strong>in</strong> terms of a s<strong>in</strong>gle <strong>in</strong>dex, but <strong>in</strong> terms of sensitivity for<br />

the disease positives and specificity for the disease negatives, so also agreement<br />

between raters should be expressed by two measures. The situations are not<br />

identical, s<strong>in</strong>ce <strong>in</strong> the former disease status represents the true situation aga<strong>in</strong>st<br />

which the diagnostic test is assessed, whereas <strong>in</strong> the latter the two raters have to<br />

be treated on an equal basis. The measures proposed are ppos, the number of<br />

agreed positives divided by the average number of positives for the two raters,<br />

and pneg def<strong>in</strong>ed similarly for the negatives. That is,<br />

ppos ˆ<br />

pneg ˆ<br />

For the three situations considered earlier<br />

1<br />

2<br />

19.10 Kappa measure of agreement 703<br />

i 2<br />

a<br />

2a<br />

1<br />

ˆ<br />

2 ‰…a ‡ b†‡…a ‡ c†Š 2a ‡ b ‡ c<br />

d<br />

2d<br />

ˆ<br />

‰…c ‡ d†‡…b ‡ d†Š 2d ‡ b ‡ c :<br />

A B Example 19.9<br />

ppos 0 50 0 88 0 94<br />

pneg 0 94 0 92 0 78<br />

k 0 44 0 79 0 72<br />

In A and B the two raters have similar agreement when the condition is absent,<br />

but <strong>in</strong> B the raters have much better agreement when the condition is present.<br />

The two measures provide <strong>in</strong>formation about the areas of agreement and<br />

disagreement which is not identifiable from a s<strong>in</strong>gle measure.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!