10.07.2015 Views

ICCS 2009 Technical Report - IEA

ICCS 2009 Technical Report - IEA

ICCS 2009 Technical Report - IEA

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

A separate missing category called “not reached” (coded as 6) was created for analysis purposes.An item was coded as not reached if the student concerned did not respond to any of the itemsfollowing it (i.e., did not continue on to the end of the test) and/or if he or she did not respondto the item preceding it. The extent of occurrence of Code 6 items provided information aboutthe appropriateness of the test’s length as well as the appropriateness of its difficulty.Figure 11.5 shows the percentages of not-reached response by item position in Test Booklet 1for regional groups of countries. As can be seen, the occurrence of not-reached responses wasfar higher in the Latin American countries than in the other groupings of countries, wherenearly all students had no problem with test length. In the Latin American countries, about 15to 16 percent of students, on average, did not reach the last item in Test Booklet 1. Regionalpatterns in relation to the other booklets were similar. However, note that there was somevariation within the country groups. In Latin America, for example, the national percentages ofnot reached for the last booklet item ranged from 9 to 24 percent.Figure 11.5: Percentages of not-reached responses for groups of countries for Test Booklet 118.016.014.012.010.08.06.04.02.00.01 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31Europe Latin America Asia-Pacific InternationalInternational item adjudicationAdjudication of test items was carried out first at the international level for the <strong>ICCS</strong> calibrationsample and then separately for each national subsample.At the international level, item characteristics were assesesed for the calibration sample. Here, thereview encompassed item-fit statistics, item-score correlations, item characteristic curves, generalmeasurement equivalence across countries (item-by-country interaction), and gender DIF.For open-ended items, account scorer reliabilities and the correct ordering of average abilityestimates per category were also taken into account. Only one of the 80 test items (CI2HRM2)had inadequate scaling properties. It was removed from the international scaling of civicknowledge.At the national level, test items were reviewed by comparing national item-fit statistics withinternational item-fit statistics. Test items for individual countries that showed large item-bycountryinteractions were flagged, and open-ended national items for which scorer agreementSCALING PROCEDURES FOR <strong>ICCS</strong> TEST ITEMS139

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!