10.07.2015 Views

ICCS 2009 Technical Report - IEA

ICCS 2009 Technical Report - IEA

ICCS 2009 Technical Report - IEA

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Assessment of scorer reliabilitiesThe open-ended items in the <strong>ICCS</strong> cognitive test were scored according to the scoring guidesthat were refined as an outcome of experiences in the international field trial of test items.Within countries, for each of the seven booklets, subsamples of about 100 student records werescored twice by different scorers. This double-scoring procedure provided an assessment ofscorer reliabilities. Table 11.2 shows the percentages of scorer agreement, which ranged from49 to 100 percent. On average, scorer agreement for the six items was between 85 and 89percent.As has been the practice in other <strong>IEA</strong> studies, data from items scored with a minimum of 70percent scorer agreement were retained for scaling and inclusion in the international database.This adjudication was made for each open-response item scored in each country.Differential item functioning by genderFurther exploration of the quality of the items was conducted through an assessment ofdifferential item functioning (DIF) by gender. DIF occurs when groups of students withthe same degree of ability have different probabilities of responding correctly to an item.For example, if boys have a higher probability than girls with the same degree of ability ofcorrectly answering an item, the item shows gender DIF. This situation is a violation of themodel, which assumes that the probability is a function of ability only and not of any groupmembership.Estimates of gender DIF were derived by including interaction terms in the item responsemodel. Gender DIF for dichotomous items could then be estimated as:P i (q) = exp(q n– (d i – h g –l ig ))1+exp(q n – (d i – h g –l ig ))For the purpose of measuring parameter equivalence across the two gender groups g, anadditional parameter for gender effects l ig is added to the scaling model, where q n is theestimated ability of person n and d i is the estimated location of item i. However, to obtain properestimates, there is also a need to include the overall gender effect (h g ) in the model. 1 Both itemby-genderinteraction estimates l ig and overall gender effects (h g ) were constrained to have asum of 0.xGender DIF estimates i for a partial credit model for items with more than two categories (here,constructed items) could then be modeled as:expS(q n – (d i – h g –l ig + t ij ))k=0P x i (q) = m ixhi =0,1,2,…,m i .S expS(q n – (d i – h g –l ig + t ij ))h=0j=0Here, q n denotes the person’s ability, d i gives the item location parameter on the latentcontinuum, t ij is the step parameter, l ig is the item-by-gender interaction effect, and h g is theoverall gender effect.Table 11.3 shows the gender DIF estimates for those items retained for scaling. As is apparentin the table, only a few items—five multiple-choice and one open-ended—showed some(limited) form of DIF (estimates larger than 0.3 logits). In general, because the gender DIF for<strong>ICCS</strong> test items was viewed as not posing a serious problem, it was decided not to exclude fromscaling any items on the basis of gender DIF.1 The minus sign ensures that higher values of the gender effect parameters indicate higher levels of item endorsement in thegender group with a higher value (here, females).134<strong>ICCS</strong> <strong>2009</strong> technical report

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!