2 Lavery, L., & Brown, G. T. L.individual students or groups/subgroups ofstudents. Teachers design an asTTle test byselecting the curriculum areas and levels ofdifficulty that they wish to assess. Theseselections are maximised by the asTTle toolto create a 40-minute pencil and paper testconsisting of a mixture of open- and closedresponseitems. Once student responses andscores are entered into the asTTle toolteachers may select a range of reports thatallow them to interpret student performanceby reference to nationally representativenorms, curriculum levels, and curriculumachievement objectives. Specifically, asTTleanswers questions related to (a) how well arestudents doing compared to similar students,(b) how well are students doing on importantachievement objectives, (c) how well studentsare doing compared to curriculumachievement levels, and (d) what are someteaching resources that would assist inimproving students’ performance.MethodologyNew Zealand teachers participated in avariety of workshops around the country runby the asTTle team to write and reviewassessment items for reading and writing.Their task was to write and review assessmentmaterials appropriate to the interests,achievement objectives, and ages of studentsin Years 5-7 learning in curriculum levels 2-4consistent with the principles of theappropriate curriculum document andconsistent with good classroom practice.Once reviewed, items were assembled intotest forms estimated to require 40 minutes forthe majority of students. Two differentstrategies were used in assembling test forms.In reading and writing, each form had anapproximately equal distribution of items ortasks across the Curriculum Levels 2 to 4 andwere administered to students at any of thetarget year levels. In contrast, themathematics test forms, although containingmaterials from all three levels, were designedto have more of a certain level for use in eachof the target year levels. The Year 5 papershad more Level 2 items, the Year 6 papershad more Level 3 items, while the Year 7papers had more Level 4 items. Note that thetest forms as administered were balanced inquite a different fashion to the asTTlecomputer tool which allows teachers tocustom select the difficulty desired regardlessof the year or age of students. For example, ateacher using asTTle can create a test with noor few hard items for a younger or less ablegroup of students and vice versa.Trials and calibrations for asTTleassessments across the reading, writing, andmathematics domains in both languages wereconducted on behalf of asTTle by classes ofstudents in New Zealand schools. Each trialand calibration was administered by teacherswith their own class of students. BetweenOctober 2000 and June 2002 data werecollected for English reading, writing, andmathematics a total of ten times (i.e., six forliteracy and four for numeracy). Trials wereconducted on relatively small samples toidentify dimensions of the items and testforms that might need modification.Calibrations were conducted after the trialswith large nationally representative samplesfor the purpose of establishing New Zealandstudent performance norms. Table 1 showsthat nearly 50,000 students completed thetrials and calibrations for all three Englishsubjects.Table 1Trial and Calibration Student Sample bySubjectSubjectSample SizeReading 24,513Writing 10,377Mathematics 13,397Total 48,287In Maori, there are about 2,500 students inMaori-medium instruction in each of thetarget years. In order to prevent over usage ofthis small number of students, trials andcalibrations were combined so that data werecollected three times between November 2001and June 2002. Note that this report is basedon the feedback and teacher evaluativecomments to the English reading, writing, andmathematics assessments only.
Technical Report 33, 2002 3Each teacher who administered the testswas asked to complete a form that structuredfeedback around key item and tooldevelopment characteristics. The keycharacteristics for which evaluative feedbackwas sought revolved around the quality of theitems, specifically their appropriateness interms of interest or engagement, difficulty,length of time needed to complete, andaround the quality of the instructions suppliedwith the test forms. In addition, teachers wereasked for their own general comments, andwere asked to summarise the nature ofstudents’ experience and evaluation of thematerials. It was intended to use the teacherfeedback to modify items for use in theasTTle tool by adjusting the language,content, or nature of items, the length of timeallowed for completing items, and forimproving administration instructions.ResultsThe total number of teachers possible toparticipate in the feedback was calculated asone teacher per 30 students. On that basis,approximately 820 responses related toreading, 350 responses related to writing, and450 responses to mathematics could havebeen expected. From the most frequentlyasked questions it is possible to identify thatat least 459 reading teachers, 483 writingteachers, and 333 mathematics teachersparticipated. This translates to about 56% ofreading teachers, 138% of writing teachers,and 74% of mathematics. The high numberof writing responses suggests that someteachers completed more than one form.Responses were in the nature of commentsto prepared questions. The comments weregenerally coded using a “Yes”, “No”, “Bothyes and no”, or “No answer”. The category“Yes” indicates a favourable or positiveresponse to the question, a “No” anunfavourable or negative response to thequestion, and “Both yes and no” indicates aresponse that contains both positive andnegative comments. “No answer” includescomments that were incapable of meaningfulinterpretation.Data for this report come from previouslyreported results (Zwiegelaar, 2000; Langstaff,2000; Irving, 2001; Parker, 2001, Parker &Brown, 2002; Zwiegelaar & Brown, 2002;Schouten & Brown, 2002; Lavery & Brown,2002) of teacher feedback to the asTTle testforms and items. The data, although notalways collected in the same manner acrossinstances, have been converted to a commonformat for this report. For example, thestudent response and general commentsquestions in both the reading and writingdomains had to be altered in order to becombined. As some responses had beencoded per teacher and others had beenreported by total number of comments, anyresponses reported in the latter format wereconverted to percentages and then calculatedinto frequency counts based on the number ofteacher responses. Furthermore, as not allquestions were asked in each trial orcalibration, questions are summarized bytopic.Content AppropriatenessThis section asked whether the content wasappropriate for students’ age and ability in theteacher’s class. Overall, teachers werepositive regarding this, particularly in thereading and writing domains (Table 2). Whilemany teachers simply gave a ‘yes’ response tothis question, others were more descriptive.For example, in reference to the mathematicsassessments, one teacher wrote “Yes, therange allowed for all students to be able toanswer some questions. Content was familiarto them and their understanding”.Table 2Appropriateness of contentNumber of Responses (Percent)Subject Yes No Both NoanswerTotalMathematics 195(59)Reading 327(71)Writing 350(72)41(12)51(11)52(11)88(26)64(14)72(15)9(3)17(4)9(2)333(100)459(100)483(100)
4 Lavery, L., & Brown, G. T. L.Across all three domains, when a negativeor mixed response was indicated, thisgenerally reflected the mixed ability levels inthe class or the fact that a small proportion ofthe students (especially those with a languageother than English at home or with lowability) experienced difficulties. Thiscriticism is an artefact of the method used inassembling test forms for calibration. Theflexibility of asTTle test creation means thatteachers will be able to assemble tests that arecustomised for these special cases.Content Interest and EngagementTeachers were very positive overall aboutthe interest and engagement of the content inthe assessment items and tasks (Table 3).Positive comments included, “All the studentsfound it interesting and wanted to keep doingit” (mathematics assessments), “Yes, childrenshowed a keen interest. Good variety, visuallygood and the use of different genre” (readingassessments), and “They enjoyed it a lot andfound it challenging, especially deciding whatto write about” (writing assessments).The responses in the ‘both’ categorytended to reflect the inappropriate difficultyspread in the test form rather than a criticismof the actual content. While most of the classwere interested and engaged, usually a smallproportion of others were not because of themismatch of difficulty to ability. For example,“The children who have ability in mathsenjoyed it. But the less able children totallyswitched off” and “Children engaged at thebeginning and started to lose interest whendifficulty increased” (reading assessments).Negative comments made up a lowproportion overall and were often qualified asbeing due to external environmental factors,particularly in the first mathematicscalibration; for example “We have just done alot of tests so they were not focused”.The feedback identified to the asTTle teamthe benefit of using classroom teachers as theoriginal source of material and tasks. Just asimportantly, it showed the positive impact ofthe use of desk top publishing for illustrationand layout of materials – the fact thatgenerous amounts of white space were usedcontributed to the children’s positive interestand engagement.Table 3Content interesting and engagingNumber of Responses (Percent)Subject Yes No Both NoanswerTotalMathematics 256(77)Reading 368(80)Writing 321(66)28(8)36(8)69(14)39(12)38(8)81(17)Teacher Instructions10(3)17(4)12(3)333(100)459(100)483(100)In most of the calibrations, teachers wereasked whether the teacher administrationinstructions were easy to use and adequate(Table 4). Responses were positive across allthree domains with the majority of teacherssimply responding by indicating "Yes". Whilea few negative comments were made, thesewere often because they were suggestions forimproving the instructional materials.Table 4Teacher InstructionsNumber of Responses (Percent)Subject Yes No Both NoanswerTotalMathematics 216(78)Reading 152(86)Writing 144(80)25(9)5(3)18(10)29(11)19(10)6(2)1(1)16 (9) 1(1)276(100)177(100)179(100)This feedback was incrementallyimplemented across time and has resulted in aconcise set of administration guidelines thatare delivered to teachers every time asTTlecreates a test.Time AllocationTeachers in the two initial mathematicstrials and the second calibration of writingassessments were asked whether the amountof time allowed to complete the test form wasappropriate. Table 5 shows that almost three-
6 Lavery, L., & Brown, G. T. L.Student ResponseTeachers tended to comment favourablywhen they were asked to report the overallresponse of their students to the asTTle testpapers (Table 7). Overall, approximately halfthe teachers indicated that their classesresponded positively to the assessment. Forexample, “Children commented that it was‘cool’ – the diagrams were interesting andkept them focused." (mathematicsassessments), and “They enjoyed it and foundit easy because they could choose the topicfor instructions [a writing task in whichstudents had to write instructions on how todo something] themselves, without beingasked to write about something or answerquestions about which they have noexperience” (writing assessments).For the mathematics and writingassessments, almost one third of teachers’responses were classed in the “both’ category,usually reflecting a mixed response from theirclass. Across all subjects, negative commentsoften reflected length or difficulty, forexample, “The students’ response to the paperwas that it was too long and boring. Whenasked why it was boring the answer was theydidn’t know how to do things, questions werehard and difficult to understand”(mathematics assessments).Again, with the actual asTTle tool,teachers will be able to ensure that thedifficulty of the assessment is appropriatelyset for students.Table 7Student responses to paperNumber of Responses (Percent)Subject Yes No Both NoanswerTotalMathematics 165(50)Reading 285(62)Writing 257(53)55(16)113(25)96(20)96(29)52(11)120(25)General Comments17(5)9(2)10(2)333(100)459(100)483(100)When asked if there were any generalcomments they would like to make, teachersmade a variety of responses (Table 8). Thesewere reasonably evenly mixed betweenpositive, negative, both, and suggestions forimprovement. However, a slightly largerproportion of teachers made positivecomments regarding the reading assessmentswhile a slightly larger proportion madenegative comments regarding the writingassessments.Table 8General commentsNumber of Responses (Percent)Comment Mathematics Reading WritingPositive 44 (14) 91 (38) 83 (19)Negative 45 (15) 41 (17) 156 (36)Both/ Neutral 38 (13) 22 (9) 27 (6)Suggestions for 50 (17) 21 (9) 10 (2)ImprovementNo answer/ 123 (41) 66 (27) 160 (37)UninterpretableTotal 300 (100) 241 (100) 436(100)Concluding CommentsThe asTTle test items and materials inreading, writing, and mathematics were wellreceived by both teachers and students interms of content, interest and engagement,and for the instructions supplied to teachers.The test forms were less positively rated interms of difficulty and time required, with themathematics assessments being somewhatless positively regarded than the literacy ones.The evaluative feedback was used by thedevelopment team to adjust the length of timeallotted to completing each item, to adjustteacher instructions, and most importantly inthe design of the asTTle test creation process.The overriding negative feedback was aboutinappropriate difficulty in the test formswhich were not custom designed to thestudents of each teacher’s class. However,with the asTTle software teachers will be ableto design and administer custom designedtests for the ability of classes and students andthus get around the fault of one size not fittingall.The evaluative feedback collected by theasTTle development team has been used toimprove the quality of the asTTle materialsand software. Furthermore, the feedback