Research Matters
N6fqJH
N6fqJH
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
incorporate the magnitude of the disagreement to compute IRR estimates, with larger-magnitude<br />
disagreements resulting in lower ICCs than smaller-magnitude disagreements” (Hallgren, 2012,<br />
p.31).The ICC used calculates the percentage of variance across all scores (i.e., from all raters)<br />
attributable to which script is being looked at 1 . Higher ICC values indicate greater IRR, with an ICC<br />
estimate of 1 indicating perfect agreement, and 0 indicating random agreement.<br />
Cicchetti (1994), as cited in Hallgren (2012), provides commonly-cited cutoffs for qualitative<br />
ratings of agreement based on ICC values, with IRR being poor for ICC values less than 0.40,<br />
fair for values between .040 and 0.59, good for values between 0.60 and 0.74, and excellent for<br />
values between 0.75 and 1.0.<br />
Discussion of agreement between coders on the 100-word sampling frame<br />
Table 4 on page 18 shows the simple agreement and ICC statistics for the variables used in the<br />
100-word sample.This contained 36 variables.The ICC statistics were generally good; 20 variables<br />
had ICCs of over 0.75, 6 of which had perfect agreement.Three variables had aberrant ICC values<br />
because of a lack of variation in the ratings.These are denoted by an asterisk (*).Three variables<br />
fell below 0.6 with only one being considered poor (less than 0.4).<br />
The ‘informal language’ count ICC was 0.34. One rater in particular gave far fewer examples than<br />
the others did. Analysis of the raters’ examples showed that, whilst more than one rater gave the<br />
majority of examples, many were unique.This indicates that the agreement, in terms of which<br />
examples were picked, may be actually lower than the agreement statistics might indicate.<br />
However, some of the examples could be considered errors rather than Non-Standard English.<br />
For example,‘testoustrone infueled’,‘eachother’.This is a difficult variable for raters to code as it<br />
depends very much on their judgement. It was recommended that the examples be coded into<br />
types of Non-Standard English to investigate further the frequency of different types.<br />
The ‘subordination’ ICC was 0.49 and the simple agreement IRR was 0.2, which is lower than ideal;<br />
the judges remarked that they found this variable difficult to judge.After much discussion, and<br />
because the ICC statistic is just acceptable, it was agreed to proceed with the data from this<br />
variable. However, further work is likely to investigate why it is so difficult to judge.<br />
The ‘capitalisation sentence – omitted’ ICC was 0.58, and analysis of the data showed that 13 of<br />
the 21 scripts had perfect agreement (evidenced by the simple agreement statistic of 0.77).<br />
Although borderline, we felt that this variable should be considered for further analysis.<br />
Discussion of agreement on the whole-text coding frame<br />
Table 5 on page 19 shows the simple agreement, Kappa and ICC statistics for the variables used in<br />
the whole-text sample.This contained nine variables.The ICC/Kappa column displays ICC statistics<br />
except for the three categorical variables which used the Kappa statistic. One variable had aberrant<br />
ICC values due to a lack of variation in the ratings.This is denoted by an asterisk (*).The agreement<br />
is not as strong for the whole-text variables, with only two (number and coherence of paragraphs)<br />
being considered to have good agreement.<br />
1.This is a different measure than was used in 2007. In the previous report, the ICC was the percentage of variance in mean scores across<br />
raters that was attributable to which script was being looked at.This yields somewhat higher ICC coefficients.<br />
RESEARCH MATTERS – SPECIAL ISSUE 4: Aspects of Writing 1980–2014 | 17