25.07.2013 Views

January 2012 Volume 15 Number 1 - Educational Technology ...

January 2012 Volume 15 Number 1 - Educational Technology ...

January 2012 Volume 15 Number 1 - Educational Technology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2 .56 .56<br />

3 .64 .64<br />

Note: a 10% aberrant severity, b 20% aberrant severity, and c 30% aberrant severity.<br />

***p < .001.<br />

Conclusion<br />

In conclusion, group-based indices seemed to perform better than IRT-based indices. The Wc&Bs, SCI, MCI, and<br />

BW indices seemed to dominate the other indices across all conditions. The NCI, lz, ECI4z, and OUTFITz indices<br />

exhibited mediocre performance, and the ECI2z and INFITz indices had the lowest detection rates.<br />

Generally, the findings of the superiority of the group-based indices over the IRT-based indices are consistent with<br />

the study comparing thirty-six indices by Karabatsos (2003). The superior detection power of Wc&Bs is supported<br />

by the study of Lu, Huang, and Fan (2007). The good performance of the MCI is consistent with the findings of the<br />

studies by D’Costa (1993b) and Rudner (1983). Regarding the IRT-based indices, the finding of good performance<br />

of lz is consistent with the findings of Birenbaum (1985), Drasgow et al. (1987), and Li and Olejnik (1997).<br />

Although the consistently good performances of ECI4z is consistent with the findings of Birenbaum (1985), Noonan,<br />

Boss, and Gessaroli (1992), Li and Olejnik (1997), and Soel (1998), it is not supported by Drasgow et al. (1987). The<br />

poor performance of ECI2z in this study is not consistent with other literatures, for example, Birenbaum (1985),<br />

Drasgow et al. (1987), Li and Olejnik (1997), and Soel (1998). With respect to the examinations of indices stability<br />

in the study, lz, ECI4z, and OUTFITz seemed unstable across the three aberrance type conditions. The NCI and<br />

ECI4z indices were unstable across the three severity level conditions. This is consistent with previous literatures, for<br />

example, Lu, Huang, and Fan (2007), Drasgow et al. (1987), Li and Olejnik (1997), and Soel (1998). This seems to<br />

indicate that these indices were condition-based. However, the BW indices exhibited the most stability across the AT<br />

and AS conditions.<br />

The reason the group-based indices performed better than the IRT-based indices may be that group-based indices are<br />

more response pattern oriented rather than response probability oriented like the IRT-based indices. When we<br />

permuted data from the original matrix, we changed people’s response patterns, but this may not have changed the<br />

probability of someone answering an item correctly. The parameters estimated in the group-based indices are based<br />

on other people’s relative response patterns, and therefore would be sensitive to changes in response patterns. On the<br />

other hand, the parameters estimated in the IRT-based indices are absolute across items and across persons; they are<br />

not as sensitive to changes in response patterns. Thus, when aberrant conditions were implemented, the IRT-based<br />

indices were less effective than group-based indices.<br />

Specifically, the reason for the poor performances of ECI2z may be due to its formula device. ECI2 measures the<br />

similarity of an observed pattern to group probabilities of correct answers. In this study, there were only<br />

approximately 33 examinees in a class responding to 16 items, and the person-fit indices were estimated on the basis<br />

of the unit data matrix at one time. The small sample size may have resulted in insignificant changes in the central<br />

ordered responses. Individual response patterns may have been similar to the group response patterns. Thus, the<br />

covariance of the observed response vector for a person and the vector for group probabilities for correct answers<br />

may be large in this study. This would lead to small values of ECI2z and result in its insensitivity when detecting<br />

aberrances. In contrast, ECI4z measures the similarity of an observed pattern to individual probabilities of correct<br />

answers. Individual probabilities for correct answers were measured through an IRT model; thus, ECI4z was less<br />

influenced by ordered small sample sizes.<br />

On the other hand, it was also interesting to note the contrast between OUTFITz and INFITz. Linacre and Wright<br />

(1994) provide a possible reason: OUTFITz is outlier-sensitive and dominated by unexpected outliers. INFITz is<br />

dominated by unexpected inlying patterns and is inlier-sensitive. Due to the small sample size and short item<br />

situation, a lack of significant changes in the central ordered responses might not lead to many unexpected inlying<br />

patterns; thus, INFITz was not sensitive. As expected, OUTFITz is outlier-sensitive, which fit well with this study<br />

because unexpected outliers occurred frequently.<br />

On the basis of the above findings, group-based indices, at least those of Wc&Bs, SCI, MCI, and BW in this study,<br />

must not be overlooked. They outperformed famous IRT-based indices due to their superior detection powers and<br />

35

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!