Insight Psychometrics Technical Manual - Canadian Test Centre

More documents

Recommendations

Info

Calibration, equating, scalingThe Rasch item response theory (IRT) model provided the basis forcalibrating the Insight assessment battery. This model defines a scale thatunifies the statistical properties of the test items with the statisticalmeasurement of the examinees. Specifically, for each item i there ischaracteristic difficulty i and for each examinee j a characteristic ability jand these together determine ij , the probability of correct response of theexaminee to the item in a log-linear fashion:log( ij / (1 – ij) ) = j – iThe Rasch model states that these probabilities are effectively independentfor any set of items given to an examinee with a given ability. From theRasch model flows an important set of consequences that permit coherentcalibration and scaling of complex tests such as Insight.The estimated ability of an examinee is a non-linear function of thenumber of items answered correctly; the function can be calculatedfrom the difficulty parameters of the included items.The standard errors of measurement of the abilities can also becalculated from the number of items answered correctly, as a functionof the difficulty parameters of the items.Examinee ability can be estimated consistently on the same scaleregardless of the set of items that is used.Item difficulty can be estimated consistently on the same scaleregardless of the set of examinees that is used in calibration.The prospect of using different item sets for different examinee samples hasbeen important to the development and implementation of Insight. From theearly to the later development forms to the final form, there are commonitems in each subtest, so consistent scaling is obtained (horizontal equating).Similarly, within a subtest from Level 1 to 2 to 3 there are common items, sothere is a consistent, equated scale across levels (vertical equating).The Rasch model and analysis provides consistent, equated results acrosslevels and forms. The key objective in the test design—that is, the selection ofitems for each level—is to provide accurate scores. Score accuracy is a matterof matching item difficulty to examinee ability, assuring that examinees ofdifferent abilities, and of different ages and grades, are tested using asufficient number of appropriate items, which means items with difficultymeasures numerically close to examinees’ ability measures.7
Practically, this means that the Level 1 forms include relatively more easy items,Level 2 forms include more medium items, and Level 3 forms include more difficultitems. Also, subtests intended to focus on low-ability examinees will includerelatively more easy items, subtests intended to focus on high-ability examinees willinclude relatively more difficult items, and subtests that are expected todiscriminate well across the whole range of abilities will include items distributedover the whole range. The situation is illustrated in Figure 12. The verticaldimension represents the Rasch ability scale, with the examinee ability distributionshown on the right. For the three levels, the positions of the items are shown on thesame scale. Since students taking Level 1 will usually have lower ability, there is agreater concentration of items in the lower part of the scale, and conversely forLevel 3, while Level 2 has more items in the middle of the scale.Figure 12: Illustration of Rasch model for vertical scalingItem DifficultyStudent AbilityLevel 1Level 2Level 390%75%50%25%10%---------8
Page 1 and 2: InsightTechnical ManualRichard Wolf
Page 3 and 4: Table of figuresFigure 1 Test level
Page 5 and 6: 29 in Level 2. Notice that some ite
Page 7 and 8: Development stepsThe items were wri
Page 9: Figure 11: Regional Anglophone popu
Page 13 and 14: Figure 13: Glr regression by age an
Page 15 and 16: 6) From the final scores, national
Page 17 and 18: Statistical analysesItem statistics
Page 19 and 20: of the correlations of the underlyi
Page 21 and 22: Validity analysesCheck on gender bi
Page 23 and 24: Figure 20: Insight compared to WISC
Page 25 and 26: Connection to Canadian Achievement
Page 27 and 28: Figure 25: Standard errors of measu
Page 29: Appendix A. Item locations and stan
Page 51: Appendix C. Item-age regressions fo

Insight Psychometrics Technical Manual - Canadian Test Centre

Create successful ePaper yourself

Delete template?

Save as template?