April 2012 Volume 15 Number 2 - Educational Technology & Society

More documents

Recommendations

Info

Yen, Y.-C., Ho, R.-G., Liao, W.-W., & Chen, L.-J. (2012). Reducing the Impact of Inappropriate Items on Reviewable Computerized Adaptive Testing. Educational Technology & Society, 15 (2), 231–243. Reducing the Impact of Inappropriate Items on Reviewable Computerized Adaptive Testing Yung-Chin Yen, Rong-Guey Ho, Wen-Wei Liao and Li-Ju Chen Graduate Institute of Information and Computer Education, National Taiwan Normal University, 162, He-ping East Road, Section 1, Taipei 106, Taiwan // Tel: +886 2 77343921 // Fax: +886 2 23512772 // scorpio@ice.ntnu.edu.tw // hrg@ntnu.edu.tw // ljchen@ice.ntnu.edu.tw // abard@ice.ntnu.edu.tw (Submitted September 20, 2010; Revised May 10, 2011; Accepted August 3, 2011) ABSTRACT In a test, the testing score would be closer to examinee’s actual ability when careless mistakes were corrected. In CAT, however, changing the answer of one item in CAT might cause the following items no longer appropriate for estimating the examinee’s ability. These inappropriate items in a reviewable CAT might in turn introduce bias in ability estimation and decrease precision. An early proposed solution to this problem was rearrangement procedure. The purpose of this study was to implement the 4PL IRT model to reduce the estimation bias introduced by inappropriate items in reviewable CAT. The results of this study indicated that the 4PL IRT model could significantly lower the estimation bias for reviewable CAT, and, while it incorporates with rearrangement procedure, provide more accurate ability estimation. Also, the efficiency of reviewable CAT was promoted by introducing both 4PL IRT model and rearrangement procedure. Keywords Item response theory (IRT), Computerized adaptive testing (CAT), Reviewable CAT, Four-parameter logistical (4PL) IRT model, Rearrangement procedure Introduction When developing a computerized adaptive testing (CAT), an important controversy was whether examinees should be allowed to review and modify previous responses (Bowles & Pommerich, 2001; Wise & Kingsbury, 2000). In traditional paper-and-pencil (P&P) tests, most examinees took review for granted. They have been trained using the remaining time to review items and catch careless errors since they entered elementary school. In most CATs, however, opportunities for item review and answer change were far less common (Mills & Stocking, 1995; Papanastasiou & Reckase, 2007; Stone & Lunz, 1994; Vicino & Moreno, 1997; Vispoel, 1998; Wise, 1996). Since the testing score would be closer to examinee’s actual ability when careless mistakes were corrected, the prohibition of reviewing items in CAT might lead to underestimating examinees’ ability (Lunz, Bergstrom, & Wright, 1992; Vispoel, Hendrickson, & Bleiler, 2000; Waddell & Blankenship, 1994). A reviewable CAT was a CAT in which examinees were allowed to review and change the answers of previous items. The underlying hypothesis of reviewable CAT was that after rereading or rethinking an item, the examinees might correct the careless mistake they made. This hypothesis afterwards led to the fact that even high-ability students might on occasion miss items that they should have answered correctly. However, changing the answer of one item in CAT might cause the following items no longer appropriate for estimating the examinee’s ability. These inappropriate items in a reviewable CAT might in turn introduce bias in ability estimation and decrease precision. Early proposed solutions to this problem included limiting review and rearrangement procedure (Chen, 2009; Papanastasiou, 2002). The same situation was also seen in traditional CAT. In virtue of the underlying characteristics of the traditional item response theory (IRT) model, if a high-ability examinee misses early items carelessly in a test, the following items would be too easy to estimate his/her true ability appropriately. To cope with the underestimation problem, Barton and Lord (1981) proposed the four-parameter logistical (4PL) model allowing a high-ability student to miss an easy item without having his/her ability drastically lowered. In contrast to other well-known IRT models, however, little attention has been given to the 4PL model. The purpose of this study was to implement the 4PL IRT model to reduce the estimation bias introduced by inappropriate items in reviewable CAT. It was also hypothesized that the 4PL model would perform better in alleviating the inappropriate item problems of reviewable CAT than the rearrangement procedure, or at least, improve the performance of rearrangement procedure by incorporating with it. ISSN 1436-4522 (online) and 1176-3647 (print). © International Forum of Educational Technology & Society (IFETS). The authors and the forum jointly retain the copyright of the articles. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear the full citation on the first page. Copyrights for components of this work owned by others than IFETS must be honoured. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from the editors at kinshuk@ieee.org. 231
4PL IRT model According to the number of parameter describing the item, IRT model could be generally classified into three widely used categories: one-parameter logistic (1PL) model, two-parameter logistic (2PL) model, and 3PL model. In 1PL model, the probability that an examinee with ability θ could answer an item with difficulty b correctly could be mathematically expressed as 1 P1PL (�) � 1� exp[�D(� � b)] . (1) The mathematical form of the 2PL model could be written as 1 P 2PL (�) � 1� exp[�Da(� � b)] (2) while new parameter a was called the discrimination parameter which allowed an item to discriminate differently among the examinees (Harvey & Hammer, 1999). In both 1PL and 2PL models, the probability of passing ranges from 0 to 1 as θ goes from -∞ to ∞. On a multiple-choice test, however, the probability of choosing the correct answer did not approach 0 even for low-ability students. Birnbaum (1968) introduced a lower asymptote to handle the situation in which examinees either guessed totally randomly or answered on the basis of their knowledge. The resulting 3PL model was P 3PL(�) � c � (1�c)P 2PL(�) , (3) where the lower asymptote c represented the probability that an extremely low ability examinee would get the item correct. In 1PL and 2PL models, the probability that a low-ability student would answer a hard item correctly should approach zero while a high-ability student should answer an easy item with probability approaching one. It was conceivable, however, this assumption might not always hold, since an examinee who knew nothing still had a chance to choose the correct answer in a multiple-choice test. Moreover, the probability might be higher for an examinee who possessed partial knowledge (Bar-Hillel, Budescu, & Attali, 2005; Burton, 2002; Gardner-Medwin & Gahan, 2003; Yen, et al., 2010). On the other hand, high-ability students who are anxious, distracted by poor testing conditions, unfamiliar with computers, careless, or misread the question, might on occasion miss items that they otherwise should have answered correctly (Hockemeyer, 2002; Rulison & Loken, 2009). Barton and Lord (1981) introduced an upper-asymptote parameter, expressed by the Greek letter delta (δ), into the 3PL model: P 4PL (�) � c � (� �c)P 2PL (�) . (4) While P 2PL (�) ranges from zero to one, P 4PL (�) ranges from the lower asymptote, c, to the upper asymptote parameter, δ, for item-specific “carelessness”. Figure 1 illustrates a typical ICC for the 4PL IRT model with b=0, a=1, c = 0.2 and δ = 0.9. Figure 1. A typical ICC for the 4PL model with b=0, a=1, c = 0.2, and δ = 0.9 To evaluate whether changing the upper asymptote improved scoring on standardized tests, Barton and Lord compared the 3PL model and 4PL model under two upper-asymptote values δ = .99 and δ = .98 by re-estimating test 232
Page 1 and 2:
April 2012 Volume 15 Number 2
Page 3 and 4:
Supporting Organizations Centre for
Page 5 and 6:
Contextualizing a MALL: Practice De
Page 7 and 8:
Kop, R. (2012). The Unexpected Conn
Page 9 and 10:
Human mediation and information flo
Page 11 and 12:
elevant information based on some k
Page 13 and 14:
of learners. A search facilitated t
Page 15 and 16:
information as human mediation mean
Page 17 and 18:
Blau, I., & Barak, A. (2012). How D
Page 19 and 20:
examined whether the readiness to p
Page 21 and 22:
Table 5 presents the ANOVA for the
Page 23 and 24:
However, in discussing a non-sensit
Page 25 and 26:
deviations showed smaller variance,
Page 27 and 28:
actual participation discussing sen
Page 29 and 30:
Mesch, G., & Elgali, Z. (2009). Soc
Page 31 and 32:
Although prior research on overall
Page 33 and 34:
with a score of 39 and below as “
Page 35 and 36:
Table 2 demonstrates mean scores an
Page 37 and 38:
Discussion Although the recent Inte
Page 39 and 40:
Furthermore, the most preferred pla
Page 41 and 42:
MEB. (2009). Internete erisim proje
Page 43 and 44:
Learning Management Systems (LMS) b
Page 45 and 46:
practice and repetition with feedba
Page 47 and 48:
Online peer assessment Peer assessm
Page 49 and 50:
At an international conference on m
Page 51 and 52:
� Ability to produce rich assessm
Page 53 and 54:
Meishar-Tal, H., & Tal-Elhasid, E.
Page 55 and 56:
Tsai, S. C. (2012). Integration of
Page 57 and 58:
order to decrease text complexity a
Page 59 and 60:
matter how the learners’ aptitude
Page 61 and 62:
If compared by programs, WP student
Page 63 and 64:
mean F2F 4.64 4.52 4.35 4.44 *: p
Page 65 and 66:
een analyzed and were compared with
Page 67 and 68:
Chen, G.-D., Lee, J.-H., Wang, C.-Y
Page 69 and 70:
and diverse cognitional and emotion
Page 71 and 72:
Encourage and persuade students wit
Page 73 and 74:
Materials and procedure Each studen
Page 75 and 76:
The lower part of Figure 4 shows st
Page 77 and 78:
Mayer, R. E., Sobko, K., & Mautone,
Page 79 and 80:
that an item is answered correctly
Page 81 and 82:
Table 1. Relative concepts frequenc
Page 83 and 84:
The Account Management Module provi
Page 85 and 86:
Experiment 1 and results Figure 7.
Page 87 and 88:
Utilization of Test Item 36 33 30 2
Page 89 and 90:
Utilization of Test Item 36 33 30 2
Page 91 and 92:
0.065 0.877 (17.56) 0.938 (22.55) 0
Page 93 and 94:
Hwang, G.J., Hsiao, C.L., & Tseng,
Page 95 and 96:
the diverse needs of modern industr
Page 97 and 98:
Figure 1. The framework of interdis
Page 99 and 100:
Research Design Figure 3. Recommend
Page 101 and 102:
friendly. 8. I feel the layout and
Page 103 and 104:
satisfaction, and system acceptance
Page 105 and 106:
Ozok, A. A., Fan, Q., & Norcio, A.
Page 107 and 108:
than students in a conventional cla
Page 109 and 110:
shown in Figure 2 to include the st
Page 111 and 112:
(1996) Pleasure-Arousal-Dominance (
Page 113 and 114:
Based on this connection, we consid
Page 115 and 116:
Final dominance measurements were p
Page 117 and 118:
student mood and thereby allow comp
Page 119 and 120:
Sessink, O., Beeftink, H., Tramper,
Page 121 and 122:
The rationale behind the video form
Page 123 and 124:
usefulness of the movies and their
Page 125 and 126:
case of the first two questions and
Page 127 and 128:
this paper in order to evaluate the
Page 129 and 130:
The survey questions Appendix A In
Page 131 and 132:
E-training in organizations E-train
Page 133 and 134:
efficacy regarding online training,
Page 135 and 136:
Method Sample and procedure Figure
Page 137 and 138:
0.874 which exceeded the recommende
Page 139 and 140:
Even though the result does not con
Page 141 and 142:
Brown, M., & Cudeck, R. (1993). Alt
Page 143 and 144:
Wen, Y., Looi, C.-K., & Chen, W. (2
Page 145 and 146:
when they are designing and impleme
Page 147 and 148:
paper focuses on the first three st
Page 149 and 150:
their self-esteem to be partners in
Page 151 and 152:
Democratizing knowledge and symmetr
Page 153 and 154:
1) Reading the article: At the star
Page 155 and 156:
Effects of RCKI principle-based ped
Page 157 and 158:
References Alexander, C. (1979). Th
Page 159 and 160:
Valsamidis, S., Kontogiannis, S., K
Page 161 and 162:
The Analog system (Yan et al., 1996
Page 163 and 164:
First, the number of the sessions a
Page 165 and 166:
BioLayout uses a modified version o
Page 167 and 168:
As shown in figure 3, all metrics c
Page 169 and 170:
Figure 7. The detailed results for
Page 171 and 172:
Enright, A. J., van Dongen, S., Ouz
Page 173 and 174:
Chu, S. K. W., Kwan, A. C. M., & Wa
Page 175 and 176:
teachers, as well as collaboration
Page 177 and 178:
from the students’ ratings appear
Page 179 and 180:
Table 5 summarizes the students’
Page 181 and 182:
eading blogs (i.e., reminders, easy
Page 183 and 184:
Luehmann, A. L. (2008). Using blogg
Page 185 and 186: Uzunboylu et al. (2009) examined th
Page 187 and 188: System implementation Overview of M
Page 189 and 190: Methods Before the experiment, a mo
Page 191 and 192: too simple, incomplete, or they eve
Page 193 and 194: Comparisons of UMLS Questionnaire i
Page 195 and 196: already described may play an impor
Page 197 and 198: Parker, D., Manstead, A. S. R., Str
Page 199 and 200: Avci, U., & Askar, P. (2012). The C
Page 201 and 202: Literature review The related varia
Page 203 and 204: collected through Personal Informat
Page 205 and 206: Self-efficacy 92 5 21 13.37 3.726 W
Page 207 and 208: In this problem, intention was take
Page 209 and 210: References Ajjan, H., & Hartshorne,
Page 211 and 212: Tan, T.-H., Lin, M.-S., Chu, Y.-L.,
Page 213 and 214: Students perused the course materia
Page 215 and 216: problem. All articles were then sen
Page 217 and 218: tests, electronic tests are more li
Page 219 and 220: Students 10 and 11: My classmate Ed
Page 221 and 222: collection was also approved by all
Page 223 and 224: Ubiquitous Revision Seamless Collab
Page 225 and 226: Tai, Y. (2012). Contextualizing a M
Page 227 and 228: tasks were developed from task type
Page 229 and 230: experiences in this phase. Moreover
Page 231 and 232: test is administered again right af
Page 233 and 234: The collected data were analyzed wi
Page 235: Gorp, K. V., & Bogaert, N. (2006).
Page 239 and 240: examinee’s ability more accuratel
Page 241 and 242: would be more accurate if items j+1
Page 243 and 244: C�I or Change I�I) would be 0.2
Page 245 and 246: improved by rearrangement procedure
Page 247 and 248: Table 6. Repeated Measures ANOVA of
Page 249 and 250: Leppisaari, I., & Lee, O. (2012). M
Page 251 and 252: Selection of the virtual tool The t
Page 253 and 254: Environmental themes emerged strong
Page 255 and 256: Visualization of text (visual langu
Page 257 and 258: environments where visual technolog
Page 259 and 260: - Logging in to learning environmen
Page 261 and 262: McNaught, C., Lam, P. & Lam, S.L. (
Page 263 and 264: on-site to finally achieve a meanin
Page 265 and 266: Table 1. Digital game designers and
Page 267 and 268: worldviews are socio-cultural-histo
Page 269 and 270: Phase Methods Visitor data collecti
Page 271 and 272: operate the game without the help o
Page 273 and 274: Visitors (end users) must be heard.
Page 275 and 276: Islas Sedano, C., Pawlowski, J., Su
Page 277 and 278: part of something larger than the s
Page 279 and 280: Previous studies have developed rel
Page 281 and 282: Technology Acceptance. The 10 items
Page 283 and 284: Table 4. Model Fit Indices Model χ
Page 285 and 286: In the final path model of set 1, t
Page 287 and 288:
Discussion Differing from prior stu
Page 289 and 290:
Haythornthwaite, C., Kazmer, M. M.,
Page 291 and 292:
Ge, Z.-G. (2012). Cyber Asynchronou
Page 293 and 294:
college of a university situated in
Page 295 and 296:
Table 3 shows us the following info
Page 297 and 298:
Appendix B shows Class 1’s respon
Page 299 and 300:
increase one’s ability to process
Page 301 and 302:
Appendix A Responses concerning syn
Page 303 and 304:
Shih, S.-C., Kuo, B.-C., & Liu, Y.-
Page 305 and 306:
Adaptive U-learning Math Path Syste
Page 307 and 308:
Figure 4. Online tutorial courses o
Page 309 and 310:
Participants and Experimental Proce
Page 311 and 312:
the influence of Test1 scores, an e
Page 313 and 314:
Hall, T., & Bannon, L. (2006). Desi
Page 315 and 316:
(Hofer & Pintrich, 1997; Liu, Lin &
Page 317 and 318:
� Through online peer assessment,
Page 319 and 320:
elativism.” High Internet self-ef
Page 321 and 322:
Peng, H., Tsai, C.-C., & Wu, Y.-T.
Page 323 and 324:
Since game-based learning has been
Page 325 and 326:
The second reason is about the sust
Page 327 and 328:
Methods Figure 5. Quest NPCs provid
Page 329 and 330:
These differences indicated that pa
Page 331 and 332:
quests further involves students’
Page 333 and 334:
Chang, I.-H. (2012). The Effect of
Page 335 and 336:
(1) Vision, planning and management
Page 337 and 338:
Based on an examination of the lite
Page 339 and 340:
Table 2. Analysis of reliability an
Page 341 and 342:
λ10 - 0.69 0.83 ε5 14.50 * - 0.32
Page 343 and 344:
Bailey, G. D. (1997). What technolo
Page 345 and 346:
Stegall, P. (1998). The principal:
Page 347 and 348:
instruction has studied the student
Page 349 and 350:
espond and the estimation of possib
Page 351 and 352:
Research question For the purpose o
Page 353 and 354:
READI reliability To verify the con
Page 355 and 356:
Physical -1.43333(*) .44699 .029 -2
Page 357 and 358:
The results of this study suggest t
Page 359 and 360:
Hsu, Y.-C., Ho, H. N. J., Tsai, C.-
Page 361 and 362:
domains. Five research questions we
Page 363 and 364:
Research sample groups (4) Faculty
Page 365 and 366:
13. Policies, Social Culture Impact
Page 367 and 368:
Non-Specified Others Engineering &
Page 369 and 370:
(AR = -2.1) (n = 7) 2. Non-specifie
Page 371 and 372:
(n): Total number of articles Resea
Page 373 and 374:
Compared to the publications in 200
Page 375:
Pedagogical Content Knowledge (TPAC
show all

April 2012 Volume 15 Number 2 - Educational Technology & Society

Create successful ePaper yourself

Delete template?

Save as template?