12.07.2015 Views

Insight Psychometrics Technical Manual - Canadian Test Centre

Insight Psychometrics Technical Manual - Canadian Test Centre

Insight Psychometrics Technical Manual - Canadian Test Centre

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Calibration, equating, scalingThe Rasch item response theory (IRT) model provided the basis forcalibrating the <strong>Insight</strong> assessment battery. This model defines a scale thatunifies the statistical properties of the test items with the statisticalmeasurement of the examinees. Specifically, for each item i there ischaracteristic difficulty i and for each examinee j a characteristic ability jand these together determine ij , the probability of correct response of theexaminee to the item in a log-linear fashion:log( ij / (1 – ij) ) = j – iThe Rasch model states that these probabilities are effectively independentfor any set of items given to an examinee with a given ability. From theRasch model flows an important set of consequences that permit coherentcalibration and scaling of complex tests such as <strong>Insight</strong>.The estimated ability of an examinee is a non-linear function of thenumber of items answered correctly; the function can be calculatedfrom the difficulty parameters of the included items.The standard errors of measurement of the abilities can also becalculated from the number of items answered correctly, as a functionof the difficulty parameters of the items.Examinee ability can be estimated consistently on the same scaleregardless of the set of items that is used.Item difficulty can be estimated consistently on the same scaleregardless of the set of examinees that is used in calibration.The prospect of using different item sets for different examinee samples hasbeen important to the development and implementation of <strong>Insight</strong>. From theearly to the later development forms to the final form, there are commonitems in each subtest, so consistent scaling is obtained (horizontal equating).Similarly, within a subtest from Level 1 to 2 to 3 there are common items, sothere is a consistent, equated scale across levels (vertical equating).The Rasch model and analysis provides consistent, equated results acrosslevels and forms. The key objective in the test design—that is, the selection ofitems for each level—is to provide accurate scores. Score accuracy is a matterof matching item difficulty to examinee ability, assuring that examinees ofdifferent abilities, and of different ages and grades, are tested using asufficient number of appropriate items, which means items with difficultymeasures numerically close to examinees’ ability measures.7

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!