22.03.2013 Views

Seeing clearly: Frame Semantic, Psycholinguistic, and Cross ...

Seeing clearly: Frame Semantic, Psycholinguistic, and Cross ...

Seeing clearly: Frame Semantic, Psycholinguistic, and Cross ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

CHAPTER 4. PSYCHOLINGUISTIC EXPERIMENTS 142<br />

The rows represent the speakers, <strong>and</strong> the columns represent the frequencies of the subjects'<br />

judgements (called \ratings"). Suppose further that Speaker A is really from New York, B<br />

from Chicago, <strong>and</strong> C from Philadelphia; then the numbers on the diagonal represent the<br />

number of agreements between the ratings <strong>and</strong> the actual dialects of the speakers.<br />

In Table 4.1 (b), the simple agreement between the actual dialect <strong>and</strong> the ratings<br />

is 2/3, or 0.67. But (c) will have exactly the same simple agreement, even though it is<br />

clear that the raters are agreeing among themselves in ways that do not match the actual<br />

dialects. In Table 4.1 (d), the simple agreement is 1/3 (0.33), even though the data do not<br />

show any relation at all between the actual dialects <strong>and</strong> the ratings.<br />

(a)<br />

(c)<br />

NY Chi Phil<br />

A 10 0 2<br />

B 2 10 0<br />

C 0 2 10<br />

simple= 0:83<br />

K= 0:55<br />

NY Chi Phil<br />

A 8 0 4<br />

B 0 8 4<br />

C 4 0 8<br />

simple= 0:67<br />

K= 0:24<br />

(b)<br />

(d)<br />

NY Chi Phil<br />

A 8 2 2<br />

B 2 8 2<br />

C 2 2 8<br />

simple= 0:67<br />

K= 0:18<br />

NY Chi Phil<br />

A 4 4 4<br />

B 4 4 4<br />

C 4 4 4<br />

simple= 0:33<br />

K= ,0:09<br />

Table 4.1: Examples of Measures of Agreement<br />

Because of the limitations of simple agreement, it is often preferable to use the<br />

kappa statistic (Scott 1955, Cohen 1960), which is corrected for chance agreement. This is<br />

the st<strong>and</strong>ard statistic for inter-rater reliability when the number of categories is xed for<br />

all raters. The basic formula is<br />

K =<br />

P (A) , P (E)<br />

1 , P (E)<br />

where P(A) is the proportion of actual agreement, <strong>and</strong> P(E) is the proportion of expected<br />

agreement. Since the numerator is the di erence between the actual <strong>and</strong> the expected,<br />

results no better than chance will give kappas around 0. Returning to Table 4.1, (a) shows<br />

that kappa drops o quite sharply when even a little disagreement occurs, <strong>and</strong> (c) gives a<br />

higher kappa (0.24) than (b) (0.18) because of the greater agreement of the raters among<br />

themselves, even if they are incorrect. As expected, kappa is near 0 (actually slightly<br />

negative) when there is no agreement among raters, as in (d). (For further discussion of the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!