10.07.2015 Views

ICCS 2009 Technical Report - IEA

ICCS 2009 Technical Report - IEA

ICCS 2009 Technical Report - IEA

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Table 11.6: National items excluded from scaling (contd.)Country Item IssueRussian Federation CI132M1 Incorrect translation of optionsSlovak Republic CI2CCM2 Large item-by-country interactionSlovak Republic CI2BIO1 Scoring problemsSlovenia CI127M1 Large item-by-country interactionSwitzerland (German version) CI121M1 Translation errorSwitzerland (German version) CI129M1 Translation errorThailand CI2BPM1 Large item-by-country interactionThailand CI2PCM2 Large item-by-country interactionThailand CI2SRM1 Large item-by-country interactionThailand CI2VOM2 Large item-by-country interactionThe overall reliability of the international test, as obtained from the scaling model, was 0.84(ACER ConQuest estimate). Table 11.9 shows the median reliabilities (Cronbach’s alpha)and median item numbers for national samples across booklets. The median test reliabilitywas 0.83 and ranged from 0.70 to 0.88. The median reliabilities were below 0.8 in only sixcountries. In these countries, the number of items had generally been reduced as a consequenceof item deletions brought about by translation/printing errors or very large item-by-countryinteractions (see section above on item adjudication).International ability estimatesIn many educational assessments, the purpose of testing is to obtain accurate estimates ofindividual domain-based cognitive abilities. The accuracy of measuring the latent ability θ canbe improved by using a larger number of test items. However, in large-scale surveys such as<strong>ICCS</strong>, the purpose is to obtain accurate population estimates by using instruments that cover awider range of possible aspects of cognitive abilities.The use of matrix-sampling design, where individual students are allocated booklets andrespond to a set of items obtained from the main pool of items, has become standard inassessments of this type. However, reducing test length and administering subsets of itemsto individual students introduces a considerable degree of uncertainty at the individual level.Aggregated student abilities of this type can lead to bias in population estimates. However, thisproblem can be addressed by employing plausible value methodology that uses all availableinformation from student tests and questionnaires, a process that leads to more accuratepopulation estimates (Mislevy, 1991; Mislevy & Sheehan, 1987; von Davier, Gonzalez, &Mislevy, <strong>2009</strong>).Using item parameters anchored at their estimated values from the calibration sample makesit possible to randomly draw plausible values from the marginal posterior of the latentdistribution for each individual. Estimations are based on the conditional item response modeland the population model, which includes the regression on background variables used forconditioning. (For a detailed description, see Adams, Wu, & Macaskill, 1997; also Adams,2002.) In order to obtain estimates of students’ civic knowledge, ACER Conquest software wasused, thereby allowing plausible values to be drawn (see Wu et al., 2007).SCALING PROCEDURES FOR <strong>ICCS</strong> TEST ITEMS143

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!