25.07.2013 Views

January 2012 Volume 15 Number 1 - Educational Technology ...

January 2012 Volume 15 Number 1 - Educational Technology ...

January 2012 Volume 15 Number 1 - Educational Technology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

their easily understandable devices. No complicated calculations were needed in their estimations, and they were<br />

always sensitive to changes in response patterns. In other words, they provide a more accurate reflection of the<br />

changes in people’s response patterns. Moreover, unlike the IRT-based indices, they are suitable for use in small<br />

samples, such as students in one class. However, the cutoffs settings of group-based indices still necessitate caution.<br />

As mentioned in the problem statement, the cutoffs of group-based indices (except for BW indices) are based on a<br />

certain empirical data or rules of thumb. Subjective criteria for cutoffs would cause the thresholds for detecting<br />

aberrances to be reached too easily, and “spurious” high detection rates may occur. Instead, the BW indices<br />

performed as well as Wc&Bs, SCI, and MCI; they had good detection rates and outperformed other IRT-based<br />

indices. They also exhibited the most stability across the AT and AS conditions among all indices. In addition, due to<br />

their sensitivity toward changes in people’s response patterns (Huang, 2006) and their established objective cutoffs<br />

(Huang, 2007), the BW can provide more conservative and reliable results for small sample sizes. The BW indices<br />

are strongly recommended for teachers who wish to diagnose students’ learning in class. Teachers may realize who<br />

tends to guess or tends to slip through the B index and the W index, respectively. A student exhibits a high value of<br />

the B index indicates he or she may obtain a “spuriously high” score that may be attributed by guessing or by<br />

creative thinking; in contrast, a student with high values of the W index may need more concerns about his/her<br />

carelessness. This study still has a few limitations. First, the AP factor manipulated in this study showed slight effect<br />

on the indices. The reason might be due to the selection of those persons whose correct responses were within 20%<br />

and 80% of the total items in order to ensure no full or null scores were generated. One might use the entire persons<br />

in the future. Second, only the AT, AS, and AP conditions were manipulated in this study. In order to ensure other<br />

sources of an index’s detection power, one can add other factors, such as item length and sample size, and other IRT<br />

models, in future studies. Finally, detection power comparisons among aberrance indices in this study are based on<br />

“spuriously aberrance response patterns.” In order to authentically reflect examinees’ response patterns, one might<br />

compare the detection power based on “true response patterns.”<br />

Acknowledgements<br />

The author would like to thank the National Science Council (NSC) in Taiwan for financial support and sincere<br />

appreciations would be expressed to Journal Reviewers for their helpful recommendations.<br />

References<br />

Birenbaum, M., (1985). Comparing the effectiveness of several IRT based appropriateness measures in detecting unusual<br />

response patterns. <strong>Educational</strong> and Psychological Measurement, 45, 523-534.<br />

D’Costa, A. (1993a, April). Extending the Sato caution index to define the within and beyond ability caution indexes. Paper<br />

presented at convention of National Council for Measurement in Education, Atlanta, GA.<br />

D’Costa, A. (1993b, April). The validity of the W, B and Sato Caution indexes. Paper presented at the Seventh International<br />

Objective Measurement Conference, Atlanta, GA.<br />

Drasgow, F., Levine, M. V., & William, E. A. (1985). Appropriateness measurement with polytomous item response models and<br />

standardized indices. British Journal of Mathematical and Statistical Psychology, 38, 67-86.<br />

Drasgow, F. & Levine, M. V. (1986). Optimal detection of certain forms of inappropriate test scores. Applied Psychological<br />

Measurement, 10, 59-67.<br />

Drasgow, F., Levine, M. V., & McLaughlin, M. E. (1987). Detecting inappropriate test scores with optimal and practical<br />

appropriateness indices. Applied Psychological Measurement, 11, 59-79.<br />

Drasgow, F., Levine, M. V., & McLaughlin, M. E. (1991). Appropriateness measurement for some multidimensional test<br />

batteries. Applied Psychological Measurement, <strong>15</strong>, 171-191.<br />

Guttman, L. (1944). A basis for scaling qualitative data. American Sociological Review, 9, 139-<strong>15</strong>0.<br />

Harnisch, D. L., & Linn, R. L. (1981). Analysis of item response patterns: Questionable test data and dissimilar curriculum<br />

practices. Journal of <strong>Educational</strong> Measurement, 18, 133-146.<br />

Huang, F. Y. (2003). Investigating the relationships of written computation, symbolic representation, and pictorial representation<br />

among sixth grade students in Taiwan. Unpublished Master thesis, National Chiayi University, Taiwan.<br />

Huang, T. W. (2006). Aberrant response diagnoses by the Beyond-Ability-Surprise index (B*) and the Within-AbilityConcern<br />

index (W*). Proceedings of 2006 Hawaii International Conference on Education, Honolulu, Hawaii, pp. 2853-2865.<br />

36

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!