The <strong>Caribbean</strong> <strong>Examiner</strong><strong>CXC</strong> Examination SystemCriterion-Referenced Testsas used by <strong>CXC</strong>By Anthony Haynes, PhDIntended purposes: Why Use CriterionReferenced Tests (CRT)?Criterion-Referenced Tests (CRTs) wereintroduced and made popular in educational andassessment fields by Popham and Husek (1969)and Glaser (1963). A criterion-referenced testcan be described as one where the candidate’sperformance on the test is compared with anexternal criterion or standard of performancewithout regard to the distribution of scoresachieved by other candidates. In other words itis the performance on or mastery of the criteriathat matters, even if all the candidates obtainthe same score.There are various definitions of CRT butthe one which best describes <strong>CXC</strong>’s experienceis that given by Glaser and Nitko (1971, p. 653).They defined a CRT as one“..that is deliberately constructed so as toyield measurements that are directly interpretablein terms of specified performance standards....The performance standards are usually specifiedby defining some domain of tasks that thestudent should perform. Representative samplesof tasks from this domain are organised into atest. Measurements are taken and are used tomake a statement about the performance of eachindividual relative to that domain.”From its inception in 1972, <strong>CXC</strong> adoptedthe CRT model and used this approach when itsfirst examinations were administered in 1979.CRT was implemented since it provided themeans for reporting on students’ achievement inrelation to content, cognitive abilities and skillsthat are clearly defined in syllabuses developed by<strong>CXC</strong> and readily available to the public.In adopting the CRT model, <strong>CXC</strong> hasfocused on three main aspects: performancestandards, test development and certificationwhich involves the interpretation of the testscores.Performance Standards: Syllabuses<strong>CXC</strong> employs a consultative process forsyllabus development. Under the guidance ofsyllabus officers, subject panels prepare syllabusesfor the various subjects examined by <strong>CXC</strong>.The performance standards for a subject areclearly outlined in the respective syllabus. Therequired content and skills are arranged into units/content areas, profiles and specific objectives.Typically, three papers are administered ondifferent occasions. The various steps to enhancetest validity involve providing syllabuses withobjectives clearly defined, from which specificobjectives are targeted to create a table ofspecifications which is used to plan the test.The selected specific objectives are judged bythe examining committee to be important andrepresentative of the test domain. The test itemsare based on the specific objectives in the table ofspecifications. These items are written to fit thespecific objectives in the table of specifications aswell as the profile dimensions specified by eachobjective. The items are subsequently aligned tothe selected criterion grade levels. The specificobjectives in the syllabus are considered ascriteria to be achieved or skills to be masteredby the candidate. The critical question whichmust be answered to judge the level of masteryis: has a candidate demonstrated a given level ofcompetence, allowing the examiner to state, withsome degree of confidence, that the candidate hasachieved the minimum standard required for aparticular grade?24 MAY <strong>2012</strong> www.cxc.org
The <strong>Caribbean</strong> <strong>Examiner</strong>Criterion-Referenced Tests as used by <strong>CXC</strong>From its inception in 1972, <strong>CXC</strong> adopted theCRT model and used this approach when itsfirst examinations were administered in 1979.Test Development:Selection of Test ContentThe use of expert judgements is one of theother techniques utilised by <strong>CXC</strong> in its CRTmodel. The examining committees regard contentvalidity highly and seek to ensure that the testcontent adequately represents the content domainand the items adequately reflect the construct.When setting standards or cut-off scoresfor each grade, the examiners are primarilyconcerned with whether or not candidates havereached established levels of mastery. Theycompare candidates’ performance, not with othercandidates in the group, but with the pre-setstandard judged to be adequate for the award ofparticular grades.They also ensure that the items whichmake up the entire examination consist ofa representative sample of the criterion (orspecific objectives) chosen from the syllabus.This process begins with the development of atable of specifications which shows the relativeweights of each selected objective within eachcell, along with the profile dimension specifiedby each objective. It also outlines the content ofthe test, the number of items, the item formats,the desired psychometric properties of the items,and the item and section arrangement. Using thetest specifications as the blueprint, questions/itemsare written or selected to be adequate exemplarsof the tasks identified and must be important forthe proficiency/level for which the examinationis intended. Although no test can measureeverything of importance, the content is selectedon the basis of its significance in the syllabus ratherthan how well it discriminates among candidates.Based on the judgement of the examiners, someitems are retained in spite of statistics which do notmeet the criteria of traditional norm referencedexaminations, since by discarding items with lowcorrelations with other items or with the total testscore, the examiner “risks making the test lessrepresentative of the defined universe” (Cronbach,1970, p. 458). The examiner’s process of creatinga representative test focuses on identifying thevarious subsections/units of the syllabus that arespecified and then ensuring that the test reflectsthe proportional weightings of each subsection(Cronbach, 1970). The preliminary grade cut-offscores are also specified at the paper developmentstage.Certification:Test Interpretation and ValidationThe validity of the examination is primarilyassured at the test construction/paper settingstage, whereas the reliability (scores) is to beassured at the standardising, marking andgrading stages. <strong>CXC</strong>’s priority is to ensure thatmarking is fair and that grades are valid andaccurate. The marking is standardised in sucha way that the mark scheme is sufficiently clearand unambiguous to be used by markers workingwith little direct coordination and supervision.There are two aspects to the mark scheme: thefirst is the profile/skill/content area to which thecandidate’s performance is judged and a scaleto which marks are awarded depending on howmuch of the skill the candidate exhibits. Thedifferent criterion performance levels (gradecriteria), which are clearly specified before thetest is constructed, are used as standards forinterpreting candidates’ scores.The absolute descriptive data on a candidate’sperformance is provided in terms of an overallgrade and profile grades to show what thecandidate can or cannot do. After the markingexercise, the preliminary grade cut-off scoreswhich were specified at paper developmentare reviewed. The review takes the form of avalidation process, which occurs during thegrading exercise, when the quality of candidates’performance and scores are compared acrossadjacent years and sittings, and the preliminarycut-off scores may be adjusted. This is donein order to ensure that the objectively definedstandard (that is, level of competence) connotedby each grade is maintained across time. Thisprocess includes the statistical evaluation ofitems/questions and profiles to ascertain thatthey measure what they purport to measure, areappropriate for the test population, minimisethe amount of test error, and are coherent instyle and format.SummaryIn terms of testing procedures, <strong>CXC</strong>has been with the forerunners in institutingcriterion-referenced testing, SBA and profilingin achievement testing. <strong>CXC</strong> has adopted CRTfor assessing how well candidates have masteredthe specified content domain and associatedskills. The criterion-referenced approach isregarded by <strong>CXC</strong> as one of the fairest, mosttransparent systems of assessment, and onethat allows the users of its certificates to makesound inferences about candidates’ mastery ofthe domains tested.ReferencesCronbach, L. J. (1970). Essentials ofPsychological Testing. (3 rd ed.) NewYork: Harper.Glaser, R. & Nitko, A.J. (1971).Measurement in learning andinstruction. In Thorndike, R.,editor, Educational Measurement,Washington, DC: American <strong>Council</strong> ofEducation, p. 625-70.Glaser, R. (1963). Instructionaltechnology and the measurementof learning outcomes AmericanPsychologist, 18, p. 519-521.Popham, W J & Husek, T. R. (1969).Implications of criterion-referencedmeasurement Journal of EducationalMeasurement, 6, p. 1-9.http://www.cxc.org/examinations/understanding-our-exams: CriterionReference Approach, Retrieved 28March <strong>2012</strong>.Dr AnthonyHaynes is aMeasurementand EvaluationOfficer in the<strong>Examinations</strong>Developmentand ProductionDivision at <strong>CXC</strong>.www.cxc.org MAY <strong>2012</strong> 25