October 2007 Volume 10 Number 4 - Educational Technology ...

More documents

Recommendations

Info

Chang, S.-H., Lin, P.-C., & Lin, Z. C. (2007). Measures of Partial Knowledge and Unexpected Responses in Multiple-Choice Tests. Educational Technology & Society, 10 (4), 95-109. Measures of Partial Knowledge and Unexpected Responses in Multiple-Choice Tests Shao-Hua Chang Department of Applied English, Southern Taiwan University, Tainan, Taiwan // shaohua@mail.stut.edu.tw Pei-Chun Lin Department of Transportation and Communication Management Science, National Cheng Kung University, Taiwan peichunl@mail.ncku.edu.tw Zih-Chuan Lin Department of Information Management, National Kaohsiung First University of Science & Technology, Taiwan u9324819@ccms.nkfust.edu.tw ABSTRACT This study investigates differences in the partial scoring performance of examinees in elimination testing and conventional dichotomous scoring of multiple-choice tests implemented on a computer-based system. Elimination testing that uses the same set of multiple-choice items rewards examinees with partial knowledge over those who are simply guessing. This study provides a computer-based test and item analysis system to reduce the difficulty of grading and item analysis following elimination tests. The Rasch model, based on item response theory for dichotomous scoring, and the partial credit model, based on graded item response for elimination testing, are the kernel of the test-diagnosis subsystem to estimate examinee ability and itemdifficulty parameters. This study draws the following conclusions: (1) examinees taking computer-based tests (CBTs) have the same performance as those taking paper-and-pencil tests (PPTs); (2) conventional scoring does not measure the same knowledge as partial scoring; (3) the partial scoring of multiple choice lowers the number of unexpected responses from examinees; and (4) the different question topics and types do not influence the performance of examinees in either PPTs or CBTs. Keywords Computer-based tests, Elimination testing, Unexpected responses, Partial knowledge, Item response theory Introduction The main missions of educators are determining learning progress and diagnosing difficulty experienced by students when studying. Testing is a conventional means of evaluating students, and testing scores can be adopted to observe learning outcomes. Multiple-choice (MC) items continue to dominate educational testing owing to their ability to effectively and simply measure constructs such as ability and achievement. Measurement experts and testing organizations prefer the MC format to others (e.g., short-answer, essay, constructed-response) for the following reasons: Content sampling is generally superior to other formats, and the application of MC formats normally leads to highly content-valid test-score interpretations. Test scores can be extremely reliable with a sufficient number of high-quality MC items. MC items can be easily pre-tested, stored, used, and reused, particularly with the advent of low-cost, computerized item-banking systems. Objective, high-speed test scoring is achievable. Diagnostic subscores are easily obtainable. Test theories (i.e., item response, generalizability, and classical) easily accommodate binary responses. Most content can be tested using this format, including many types of higher-level thinking (Haladyna & Downing, 1989). However, the conventional MC examination scheme requires examinees to evaluate each option and select one answer. Examinees are often absolutely certain that some of the options are incorrect, but still unable to identify the correct response (Bradbard, Parker, & Stone, 2004). From the viewpoint of learning, knowledge is accumulated continuously rather than on an all-or-nothing basis. The conventional scoring format of the MC examination cannot ISSN 1436-4522 (online) and 1176-3647 (print). © International Forum of Educational Technology & Society (IFETS). The authors and the forum jointly retain the copyright of the articles. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear the full citation on the first page. Copyrights for components of this work owned by others than IFETS must be honoured. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from the editors at kinshuk@ieee.org. 95
distinguish between partial knowledge (Coombs, Milholland, & Womer, 1956) and the absence of knowledge. In conventional MC tests, students choose only one response. The number of correctly answered questions is counted, and the scoring method is called number scoring (NS). Akeroyd (1982) stated that NS makes the simplifying assumption that all of the wrong answers of students are the results of random guesses, thus neglecting the existence of partial knowledge. Coombs et al. (1956) first proposed an alternative method for administering MC tests. In their procedure, students are instructed to mark as many incorrect options as they can identify. This procedure is referred to as elimination testing (ET). Bush (2001) presented a multiple-choice test format that permits an examinee who is uncertain of the correct answer to a question to select more than one answer. Incorrect selections are penalized by negative marking. The aim of both the Bush and Coombs schemes is to reward examinees with partial knowledge over those who are simply guessing. Education researchers have been continuously concerned not only about how to evaluate students’ partial knowledge accurately but also about how to reduce the number of unexpected responses. The number of correctly answered questions is composed of two numbers: the number of questions to which the students actually know the answer, and the number of questions to which the students correctly guess the answer (Bradbard et al., 2004). A higher frequency of the second case indicates a less reliable learning performance evaluation. Chan & Kennedy (2002) compared student scores on MC and equivalent constructed-response questions, and found that students do indeed score better on constructed-response questions for particular MC questions. Although constructed-response testing produces fewer unexpected responses than the conventional dichotomous scoring method, the change of item constructs raises the complexity of both creating the test and of the post-test item grading and analysis, whereas ET uses the same set of MC items and makes guessing a futile effort. Bradbard et al. (2004) suggested that the greatest obstacle in implementing ET is the complexity of grading and the analysis of test items following traditional paper assessment. Accordingly, examiners are not very willing to adopt ET. To overcome this problem, this study provides an integrated computer-based test and item-analysis system to reduce the difficulty of grading and item analysis following testing. Computer-based tests (CBTs) offer several advantages over traditional paper-and-pencil tests (PPTs). The benefits of CBTs include reduced costs of data entry, improved rate of disclosure, ease of data conversion into databases, and reduced likelihood of missing data (Hagler, Norman, Radick, Calfas, & Sallis, 2005). Once set up, CBTs are easier to administer than PPTs. CBTs offer the possibility of instant grading and automatic tracking and averaging of grades. In addition, they are easier to manipulate to reduce cheating (Inouye & Bunderson, 1986; Bodmann & Robinson, 2004). Most CBTs measure test item difficulty based on the percentage of correct responses. A higher percentage of correct responses implies an easier test item. The approach of test items analysis disregards the relationship between the examinee’s ability and item difficulty. For instance, if the percentage of correct responses for test item A is quite small, then the test item analysis system categorizes it as “difficult.” However, statistics also reveal that more failing examinees than passing examinees answer item A correctly. Therefore, the design of test item A may be inappropriate, misleading, or unclear, and should be further studied to aid future curriculum designers to compose high-quality items. To avoid the fallacy of percentage of correct responses, this study constructs a CBT system that applies the Rasch model based on item response theory for dichotomous scoring and the partial credit model based on graded item response for ET to estimate the examinee ability and item difficulty parameters (Baker, 1992; Hambleton & Swaminathan, 1985; Zhu & Cole, 1996; Wright & Stone, 1979; Zhu, 1996; Wright & Masters, 1982). Before ET implemented by computer-based system is broadly adopted, we still need to examine whether any discrepancy exists in the performance of examinees who take elimination tests on paper and the performance of those who take CBTs. This study compares the scores of examinees taking tests using the NS of dichotomous scoring method and the partial scoring of ET using the same set of MC items in CBT and PPT settings, where the content subject is operations management. This study has the following specific goals: 1. Evaluate whether the partial scoring for the MC test produces fewer unexpected responses of examinees. 2. Compare the examinee performance on conventional PPTs with their performance on CBTs. 3. Analyze whether different question content, such as calculation and concept, influences the performance of examinees on PPTs and CBTs. 4. Investigate the relationship between an examinee’s ability and the item difficulty, to help the curriculum designers compose high-quality items. 96
Page 1 and 2:
October 2007 Volume 10 Number 4
Page 3 and 4:
Abstracting and Indexing Educationa
Page 5 and 6:
The Relationship of Kolb Learning S
Page 7 and 8:
a question about learning together
Page 9 and 10:
affordances of networked learning s
Page 11 and 12:
included a unit on collaborative le
Page 13 and 14:
on at different times and work indi
Page 15 and 16:
• the numerous evaluation studies
Page 17 and 18:
wanted to work. The same inquiry pr
Page 19 and 20:
usability studies have a place in t
Page 21 and 22:
Milrad, M., & Jackskon, M. (2007).
Page 23 and 24:
information- and communication tech
Page 25 and 26:
Interactive Examination, the studen
Page 27 and 28:
Table 1. Criteria for grading OD st
Page 29 and 30:
content. Differences in the attitud
Page 31 and 32:
Even though further research on the
Page 33 and 34:
Olofsson, A. D. (2007). Participati
Page 35 and 36:
Etzioni (1993) points out that an i
Page 37 and 38:
programme. They are rather construc
Page 39 and 40:
to be that each individual trainee
Page 41 and 42:
Bernmark-Ottosson, A. (2005). Demok
Page 43 and 44:
The Knowledge Foundation (2005). IT
Page 45 and 46:
Figure 1. Design Theories in contex
Page 47 and 48:
attention as we frequently discuss
Page 49 and 50: Blog Reflection This design concept
Page 51 and 52: Narrative Structure (Form) Content
Page 53 and 54: Löwgren, J., & Stolterman, E. (200
Page 55 and 56: critique, neither do the single ind
Page 57 and 58: suggested by Ljungberg (1999b) we c
Page 59 and 60: the del.icio.us site (see figure 3)
Page 61 and 62: Device cultures It is not only the
Page 63 and 64: uilt using a camera-equipped PDA ru
Page 65 and 66: Jones, C., Dirckinck-Holmfeld, L.,
Page 67 and 68: Milrad, M., & Spikol, D. (2007). An
Page 69 and 70: components to be able to deal with
Page 71 and 72: compulsory throughout the project.
Page 73 and 74: only with their existing day-today
Page 75 and 76: As our work continues, we will try
Page 77 and 78: cognition as well as self-regulated
Page 79 and 80: Mobile mind map tool for stimulatin
Page 81 and 82: the pictorial knowledge representat
Page 83 and 84: Ericsson, K. A., & Simon, H. A. (19
Page 85 and 86: Al-A'ali, M. (2007). Implementation
Page 87 and 88: complex cognitive skills involve em
Page 89 and 90: Table 1. CAT and a linear mathemati
Page 91 and 92: It is agreed that the difficulty le
Page 93 and 94: Figure 7. Our newly added factors i
Page 95 and 96: question factors according to IRT;
Page 97 and 98: Instructure 1 Login User-ID Passwor
Page 99: Shute, V., & Towle, B. (2003). Adap
Page 103 and 104: esponse is identified, then the sco
Page 105 and 106: are stored in the answer record and
Page 107 and 108: 2 (7) MNSQ = ∑WniZ ni ∑ Wni =
Page 109 and 110: in the course. A randomized block d
Page 111 and 112: To test whether ET can effectively
Page 113 and 114: course, more comparison tests could
Page 115 and 116: Fleischmann, K. R. (2007). Standard
Page 117 and 118: educational standards in practice t
Page 119 and 120: laboratory activities, which are bu
Page 121 and 122: process is currently a largely top-
Page 123 and 124: Hsu, Y.-S., Wu, H.-K., & Hwang, F.-
Page 125 and 126: evolution with computing technology
Page 127 and 128: Resources 0.55 0.72 e17 In my schoo
Page 129 and 130: teachers’ instructional evolution
Page 131 and 132: Regression models indicating relati
Page 133 and 134: of beliefs about integrating comput
Page 135 and 136: Sinko, M., & Lehtinen, E. (1999). T
Page 137 and 138: concept of social support. Lastly,
Page 139 and 140: Procedure Content analysis procedur
Page 141 and 142: and mice are out of order. I have t
Page 143 and 144: feature of supportive online groups
Page 145 and 146: Anderson, J., & Lee, A. (1995). Lit
Page 147 and 148: Schwab, R. L., Jackson, S. E., & Sc
Page 149 and 150: Investigating the problem An extens
Page 151 and 152:
meets the anywhere, anytime require
Page 153 and 154:
minimise housekeeping tasks and fre
Page 155 and 156:
3. Join Group Discussion 4. Attend
Page 157 and 158:
Figure 4. Learning Shell showing He
Page 159 and 160:
engineering the Learning Shell so i
Page 161 and 162:
Olfos, R., & Zulantay, H. (2007). R
Page 163 and 164:
portion of CourseInfo requires that
Page 165 and 166:
Table 2. Assignment evaluation resp
Page 167 and 168:
E-3c. Likert scale on usability: Th
Page 169 and 170:
organized in four groups, namely: T
Page 171 and 172:
The correlations were not high enou
Page 173 and 174:
Data illustrate some reliability an
Page 175 and 176:
nature, some specific core objectiv
Page 177 and 178:
Frederiksen, J.R., & Collins, A. (1
Page 179 and 180:
Bottino, R. M., & Robotti, E. (2007
Page 181 and 182:
The Text Editor allows the editing
Page 183 and 184:
Mathematical content and structure
Page 185 and 186:
Consolidation exercises Solution co
Page 187 and 188:
As to course components, the activi
Page 189 and 190:
handle more formal representations
Page 191 and 192:
Lagrange, J. B., Artigue, M., Labor
Page 193 and 194:
processed information was classifie
Page 195 and 196:
Method Participants The participant
Page 197 and 198:
Table 5. Learning outcomes of diffe
Page 199 and 200:
peers and benefited most from discu
Page 201 and 202:
Kayes, D.C. (2005). Internal validi
Page 203 and 204:
This paper considers the potential
Page 205 and 206:
The first assumption ensures that c
Page 207 and 208:
allowed to rise when external fundi
Page 209 and 210:
The second option, lowering the cos
Page 211 and 212:
Downes, S. (2003). Design and Reusa
Page 213 and 214:
Appendix A. Good-enough approaches
Page 215 and 216:
Rapid Prototyping and Design Based
Page 217 and 218:
Initial Design Because the field-ba
Page 219 and 220:
• reducing the number of required
Page 221 and 222:
that they felt this hindrance; I fe
Page 223 and 224:
goals” (Edelson, 2002, p. 114) wa
Page 225 and 226:
more and more from the periphery of
Page 227 and 228:
Edelson, M. (1988). The hermeneutic
Page 229 and 230:
Wang, H.-Y., & Chen, S. M. (2007).
Page 231 and 232:
If the universe of discourse U is a
Page 233 and 234:
we can see that S( X ) − S( Y ) S
Page 235 and 236:
Example 1: Let Ã and B ~ be two va
Page 237 and 238:
columns shown in Table 2, where 1
Page 239 and 240:
(3) Find the maximum value among th
Page 241 and 242:
By applying Eq. (7), we can get the
Page 243 and 244:
Q.2 carries 25 marks, Q.3 carries 2
Page 245 and 246:
the proposed methods can evaluate s
Page 247 and 248:
Gogoulou, A., Gouli, E., Grigoriado
Page 249 and 250:
enabling learner (subject) to work
Page 251 and 252:
• Self-, Peer- and Collaborative-
Page 253 and 254:
(iii) Regulates the communication:
Page 255 and 256:
Figure 3. A screen shot of the SCAL
Page 257 and 258:
2 nd study: Thirty-five students pa
Page 259 and 260:
them characterized it as time and e
Page 261 and 262:
Soller, A. (2001). Supporting Socia
Page 263 and 264:
making reference to the others wher
Page 265 and 266:
Solution 2.2: Deliberately select h
Page 267 and 268:
just near a deadline, when it may b
Page 269 and 270:
assessed on an individual basis. Wi
Page 271 and 272:
- and even appreciated - by student
Page 273 and 274:
Panitz, T., & Panitz, P. (1998). En
Page 275 and 276:
Concept Mapping Concept mapping by
Page 277 and 278:
Research Questions In order to dete
Page 279 and 280:
with their teacher but rather than
Page 281 and 282:
Tukey’s HSD post hoc test reveale
Page 283 and 284:
collaboratively during study time?
Page 285 and 286:
Jegede, O.J., Alaiyemola, F.F., & O
Page 287 and 288:
esources. The use of synchronous te
Page 289 and 290:
The use of digital communication mo
Page 291 and 292:
contributed more information and en
Page 293 and 294:
3. Use the second screen for the le
Page 295 and 296:
• Use drawing to develop writing
Page 297 and 298:
student and teacher on the synchron
Page 299 and 300:
Vogel, J. J., Vogel, D. S., Cannon-
Page 301 and 302:
Cases The first set of ten case stu
Page 303 and 304:
Kılıçkaya, F. (2007). Website re
show all

October 2007 Volume 10 Number 4 - Educational Technology ...

Create successful ePaper yourself

Delete template?

Save as template?