10.07.2015 Views

ICCS 2009 Technical Report - IEA

ICCS 2009 Technical Report - IEA

ICCS 2009 Technical Report - IEA

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

fell below 70 percent were removed. All open-ended items for two countries were omittedbecause it was evident that the students in them found it easier than their internationalcounterparts to answer these items correctly.National centers were provided with item statistics (see example in Table 11.5) and requestedto review flagged test items. These items included cases of unusual item-total correlation (e.g.,negative correlations between correct response and overall score) and those showing largedifferences between national and international item difficulties. They also included openendeditems where the category-total correlations were disordered. In some cases, nationalcenters informed the international study center (ISC) of translation problems that had not beendetected during verification. In these cases, the items were categorized as “not administered” inthe international database and were excluded from scaling of the corresponding national data.Working independently from those conducting the national item reviews, members of the ISCflagged national items that showed irregular scaling properties (item misfit or large item-bycountryinteractions) and conducted post-verifications of item translation. In a number of cases,they identified additional national items that needed to be set to “not administered” in theinternational database and then excluded from scaling of the corresponding national data.There were instances of items being correctly translated but showing item-by-countryinteraction estimates larger than 1.3 logits (a measurement akin to about two standarddeviations of the overall distribution of item difficulties in the test). In all cases, national itemswere removed from scaling of the national data but included in the international database. Table11.6 lists the items that were excluded from scaling across the various national samples becauseof translation/printing errors or large item-by-country interactions.International item calibration and test reliabilityItem parameters were obtained from calibration samples consisting of randomly selectedsubsamples from each country. The calibration of student item parameters involved randomlyselecting subsamples of 500 students from each national sample. This process ensured that eachcountry that had met sample participation requirements was equally represented in the sample.The random selection was based on the final student weights, and the final calibration sampleincluded data from 18,000 students.Missing student responses that were likely to be due to problems with test length (“not reacheditems”) were omitted from the calibration of item parameters, but were treated as “incorrect”during scaling of the student responses. The not-reached items were defined as all consecutivemissing values that occurred from the end of the test back. However, the first missing value ofeach of these not-reached series was coded as “missing.”Data from countries that did not meet the sampling requirements after inclusion of replacementschools (Category 3) were not included in the calibration of item parameters. Table 11.7 showsthe final item parameters used to scale the <strong>ICCS</strong> test data that were based on the internationalcalibration sample. The table also shows the standard errors for these parameters.In order to account for possible positioning effects caused by the allocation of items todifferent booklets during scaling, a facet model that included a booklet effect, as estimatedvia the software package ACER ConQuest, was used. The booklet effects that emerged weregenerally rather small, with Booklet 5 being about 0.03 logits easier and Booklet 7 about0.03 logits more difficult than the average booklet difficulty. The inclusion of the booklet facetin the scaling did not change the estimated item parameters but ensured that differences inbooklet differences did not affect the scaling of student abilities. Table 11.8 shows the bookletparameters used for the final scaling of test data.140<strong>ICCS</strong> <strong>2009</strong> technical report

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!