January 2012 Volume 15 Number 1 - Educational Technology ...
January 2012 Volume 15 Number 1 - Educational Technology ...
January 2012 Volume 15 Number 1 - Educational Technology ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
2 .56 .56<br />
3 .64 .64<br />
Note: a 10% aberrant severity, b 20% aberrant severity, and c 30% aberrant severity.<br />
***p < .001.<br />
Conclusion<br />
In conclusion, group-based indices seemed to perform better than IRT-based indices. The Wc&Bs, SCI, MCI, and<br />
BW indices seemed to dominate the other indices across all conditions. The NCI, lz, ECI4z, and OUTFITz indices<br />
exhibited mediocre performance, and the ECI2z and INFITz indices had the lowest detection rates.<br />
Generally, the findings of the superiority of the group-based indices over the IRT-based indices are consistent with<br />
the study comparing thirty-six indices by Karabatsos (2003). The superior detection power of Wc&Bs is supported<br />
by the study of Lu, Huang, and Fan (2007). The good performance of the MCI is consistent with the findings of the<br />
studies by D’Costa (1993b) and Rudner (1983). Regarding the IRT-based indices, the finding of good performance<br />
of lz is consistent with the findings of Birenbaum (1985), Drasgow et al. (1987), and Li and Olejnik (1997).<br />
Although the consistently good performances of ECI4z is consistent with the findings of Birenbaum (1985), Noonan,<br />
Boss, and Gessaroli (1992), Li and Olejnik (1997), and Soel (1998), it is not supported by Drasgow et al. (1987). The<br />
poor performance of ECI2z in this study is not consistent with other literatures, for example, Birenbaum (1985),<br />
Drasgow et al. (1987), Li and Olejnik (1997), and Soel (1998). With respect to the examinations of indices stability<br />
in the study, lz, ECI4z, and OUTFITz seemed unstable across the three aberrance type conditions. The NCI and<br />
ECI4z indices were unstable across the three severity level conditions. This is consistent with previous literatures, for<br />
example, Lu, Huang, and Fan (2007), Drasgow et al. (1987), Li and Olejnik (1997), and Soel (1998). This seems to<br />
indicate that these indices were condition-based. However, the BW indices exhibited the most stability across the AT<br />
and AS conditions.<br />
The reason the group-based indices performed better than the IRT-based indices may be that group-based indices are<br />
more response pattern oriented rather than response probability oriented like the IRT-based indices. When we<br />
permuted data from the original matrix, we changed people’s response patterns, but this may not have changed the<br />
probability of someone answering an item correctly. The parameters estimated in the group-based indices are based<br />
on other people’s relative response patterns, and therefore would be sensitive to changes in response patterns. On the<br />
other hand, the parameters estimated in the IRT-based indices are absolute across items and across persons; they are<br />
not as sensitive to changes in response patterns. Thus, when aberrant conditions were implemented, the IRT-based<br />
indices were less effective than group-based indices.<br />
Specifically, the reason for the poor performances of ECI2z may be due to its formula device. ECI2 measures the<br />
similarity of an observed pattern to group probabilities of correct answers. In this study, there were only<br />
approximately 33 examinees in a class responding to 16 items, and the person-fit indices were estimated on the basis<br />
of the unit data matrix at one time. The small sample size may have resulted in insignificant changes in the central<br />
ordered responses. Individual response patterns may have been similar to the group response patterns. Thus, the<br />
covariance of the observed response vector for a person and the vector for group probabilities for correct answers<br />
may be large in this study. This would lead to small values of ECI2z and result in its insensitivity when detecting<br />
aberrances. In contrast, ECI4z measures the similarity of an observed pattern to individual probabilities of correct<br />
answers. Individual probabilities for correct answers were measured through an IRT model; thus, ECI4z was less<br />
influenced by ordered small sample sizes.<br />
On the other hand, it was also interesting to note the contrast between OUTFITz and INFITz. Linacre and Wright<br />
(1994) provide a possible reason: OUTFITz is outlier-sensitive and dominated by unexpected outliers. INFITz is<br />
dominated by unexpected inlying patterns and is inlier-sensitive. Due to the small sample size and short item<br />
situation, a lack of significant changes in the central ordered responses might not lead to many unexpected inlying<br />
patterns; thus, INFITz was not sensitive. As expected, OUTFITz is outlier-sensitive, which fit well with this study<br />
because unexpected outliers occurred frequently.<br />
On the basis of the above findings, group-based indices, at least those of Wc&Bs, SCI, MCI, and BW in this study,<br />
must not be overlooked. They outperformed famous IRT-based indices due to their superior detection powers and<br />
35