January 2012 Volume 15 Number 1 - Educational Technology ...
January 2012 Volume 15 Number 1 - Educational Technology ...
January 2012 Volume 15 Number 1 - Educational Technology ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
their easily understandable devices. No complicated calculations were needed in their estimations, and they were<br />
always sensitive to changes in response patterns. In other words, they provide a more accurate reflection of the<br />
changes in people’s response patterns. Moreover, unlike the IRT-based indices, they are suitable for use in small<br />
samples, such as students in one class. However, the cutoffs settings of group-based indices still necessitate caution.<br />
As mentioned in the problem statement, the cutoffs of group-based indices (except for BW indices) are based on a<br />
certain empirical data or rules of thumb. Subjective criteria for cutoffs would cause the thresholds for detecting<br />
aberrances to be reached too easily, and “spurious” high detection rates may occur. Instead, the BW indices<br />
performed as well as Wc&Bs, SCI, and MCI; they had good detection rates and outperformed other IRT-based<br />
indices. They also exhibited the most stability across the AT and AS conditions among all indices. In addition, due to<br />
their sensitivity toward changes in people’s response patterns (Huang, 2006) and their established objective cutoffs<br />
(Huang, 2007), the BW can provide more conservative and reliable results for small sample sizes. The BW indices<br />
are strongly recommended for teachers who wish to diagnose students’ learning in class. Teachers may realize who<br />
tends to guess or tends to slip through the B index and the W index, respectively. A student exhibits a high value of<br />
the B index indicates he or she may obtain a “spuriously high” score that may be attributed by guessing or by<br />
creative thinking; in contrast, a student with high values of the W index may need more concerns about his/her<br />
carelessness. This study still has a few limitations. First, the AP factor manipulated in this study showed slight effect<br />
on the indices. The reason might be due to the selection of those persons whose correct responses were within 20%<br />
and 80% of the total items in order to ensure no full or null scores were generated. One might use the entire persons<br />
in the future. Second, only the AT, AS, and AP conditions were manipulated in this study. In order to ensure other<br />
sources of an index’s detection power, one can add other factors, such as item length and sample size, and other IRT<br />
models, in future studies. Finally, detection power comparisons among aberrance indices in this study are based on<br />
“spuriously aberrance response patterns.” In order to authentically reflect examinees’ response patterns, one might<br />
compare the detection power based on “true response patterns.”<br />
Acknowledgements<br />
The author would like to thank the National Science Council (NSC) in Taiwan for financial support and sincere<br />
appreciations would be expressed to Journal Reviewers for their helpful recommendations.<br />
References<br />
Birenbaum, M., (1985). Comparing the effectiveness of several IRT based appropriateness measures in detecting unusual<br />
response patterns. <strong>Educational</strong> and Psychological Measurement, 45, 523-534.<br />
D’Costa, A. (1993a, April). Extending the Sato caution index to define the within and beyond ability caution indexes. Paper<br />
presented at convention of National Council for Measurement in Education, Atlanta, GA.<br />
D’Costa, A. (1993b, April). The validity of the W, B and Sato Caution indexes. Paper presented at the Seventh International<br />
Objective Measurement Conference, Atlanta, GA.<br />
Drasgow, F., Levine, M. V., & William, E. A. (1985). Appropriateness measurement with polytomous item response models and<br />
standardized indices. British Journal of Mathematical and Statistical Psychology, 38, 67-86.<br />
Drasgow, F. & Levine, M. V. (1986). Optimal detection of certain forms of inappropriate test scores. Applied Psychological<br />
Measurement, 10, 59-67.<br />
Drasgow, F., Levine, M. V., & McLaughlin, M. E. (1987). Detecting inappropriate test scores with optimal and practical<br />
appropriateness indices. Applied Psychological Measurement, 11, 59-79.<br />
Drasgow, F., Levine, M. V., & McLaughlin, M. E. (1991). Appropriateness measurement for some multidimensional test<br />
batteries. Applied Psychological Measurement, <strong>15</strong>, 171-191.<br />
Guttman, L. (1944). A basis for scaling qualitative data. American Sociological Review, 9, 139-<strong>15</strong>0.<br />
Harnisch, D. L., & Linn, R. L. (1981). Analysis of item response patterns: Questionable test data and dissimilar curriculum<br />
practices. Journal of <strong>Educational</strong> Measurement, 18, 133-146.<br />
Huang, F. Y. (2003). Investigating the relationships of written computation, symbolic representation, and pictorial representation<br />
among sixth grade students in Taiwan. Unpublished Master thesis, National Chiayi University, Taiwan.<br />
Huang, T. W. (2006). Aberrant response diagnoses by the Beyond-Ability-Surprise index (B*) and the Within-AbilityConcern<br />
index (W*). Proceedings of 2006 Hawaii International Conference on Education, Honolulu, Hawaii, pp. 2853-2865.<br />
36