21.07.2015 Views

Computer-based and paper-based writing assessment: a ...

Computer-based and paper-based writing assessment: a ...

Computer-based and paper-based writing assessment: a ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2 | CAMBRIDGE ESOL : RESEARCH NOTES : ISSUE 34 / NOVEMBER 2008of mode of composition, by giving students the choice ofmode in which to compose their essay, increases the effectof mode of presentation. Raters may be influenced, bothpositively <strong>and</strong> negatively, by the appearance of essays inh<strong>and</strong>written versus typed text. Studies by Powers, Fowles,Farnum <strong>and</strong> Ramsey (1994) <strong>and</strong> Russell <strong>and</strong> Tao (2004a)found that essays presented in h<strong>and</strong>written form receivehigher scores than the same essays presented as computertext. The authors suggested a number of hypotheses toaccount for this: increased visibility of errors in thecomputer text; higher expectations for computer texts;h<strong>and</strong><strong>writing</strong> enabling the reader to feel closer to the writer;h<strong>and</strong>written responses being given the benefit of the doubtwhen hard to read; <strong>and</strong> h<strong>and</strong>written responses appearinglonger <strong>and</strong> the result of greater effort.Examiner training <strong>and</strong> st<strong>and</strong>ardisation can help ensurethat any presentation effects are minimised as far aspossible. In a small follow-up study Russell <strong>and</strong> Tao(2004b) were able to demonstrate that the presentationeffect could be eliminated with supplementary training.In order to train raters in the presentation effect <strong>and</strong> howto avoid it, we need to look at differences in appearancebetween the texts <strong>and</strong> how they arise.MethodologyAuthentic test data was obtained in order that the resultscould be directly applied to an <strong>assessment</strong> context; thus aPET part three <strong>writing</strong> task common to both a live PB <strong>and</strong> alive CB examination session was chosen. This task is alonger piece of continuous <strong>writing</strong> (100 words) in responseto an extract from an informal letter written by a friend. Thisextract provides the topic c<strong>and</strong>idates must write about, witha couple of questions included to focus their ideas. Thetask chosen was on the topic of sport.The study focused on whether the written output from an<strong>assessment</strong> task was comparable in terms of linguistic <strong>and</strong>text features when produced via a <strong>paper</strong>- or computer<strong>based</strong>administration mode. The following features werestudied:• Lexical resources: text length, st<strong>and</strong>ardised type-tokenratio, lexical sophistication, comparison of c<strong>and</strong>idateoutput <strong>and</strong> PET wordlists• Organisation in terms of sentences <strong>and</strong> paragraphs• Surface features: punctuation, capitalisation <strong>and</strong> use oftext/email conventions• Frequency of typographical/spelling errors that do <strong>and</strong>do not impede comprehension.The methodology was finalised after a pilot study (notdescribed here). It is important to note that the CBparticipants had no access to word-processing tools suchas spell- or grammar checkers. Analysis was conductedusing Wordsmith Tools (Scott 1998), Range (Nation <strong>and</strong>Heatley 1996) <strong>and</strong> SPSS. The linguistic features studiedwere <strong>based</strong> on sub-sections of the markscheme althoughthe range <strong>and</strong> accuracy of grammar were not studied here.Scripts were collected from the two exam sessions <strong>and</strong>c<strong>and</strong>idates selected so that they were matched on examcentre <strong>and</strong> language proficiency. This meant that sampledc<strong>and</strong>idates would have had the choice of administrationmode thus selecting the mode most suitable to them,a condition suggested by Russell <strong>and</strong> Plati (2002:21).Thus any possible impact of computer familiarity or anxietyon the test output is likely to be reduced. C<strong>and</strong>idates fromfour examination centres were selected: one Columbian,two Italian, <strong>and</strong> one Swiss centre. The total number in thePB group was 86 <strong>and</strong> the total number in the CB groupwas 82.ResultsWe present the results in four sections below, relating tolexical resources, sentences <strong>and</strong> paragraphs, surfacefeatures <strong>and</strong> spelling errors.Lexical resourcesAnalysis of the two groups using Wordsmith Tools revealedthat PB texts were on average 4 words longer; this groupalso had a greater st<strong>and</strong>ard deviation (19.4), suggestinggreater variability between the c<strong>and</strong>idates (see Table 1).Table 1: Lexical output <strong>and</strong> variation by administration modeAdministration mode PB CB————————— —————————Mean SD Mean SDTokens 108 19.4 104 15.4Types 69 11.9 69 8.7St<strong>and</strong>ardised Type/Token Ratio 77 5.4 79 5.3Average Word Length 4 0.2 4 0.3A t-test revealed no significant difference in the meannumber of words written between the two groups (t=-1.294,df=154, p=0.197). Thus there is insufficient evidence tosuggest a difference in text length between theadministration modes.Many of the studies conducted on direct <strong>writing</strong> showthat participants <strong>writing</strong> in CB mode generally write more(Goldberg et al 2000, Li 2006, MacCann et al 2002). Theseare contrary to the findings from this study. It must beremembered that this task does have a word limit:c<strong>and</strong>idates are asked to write 100 words. This <strong>and</strong> the factthat it is a timed situation, will impact on the numbers ofwords written. In addition, the CB c<strong>and</strong>idates have a wordcount available to them so they can be more accurate inachieving this target; which may explain the lower st<strong>and</strong>arddeviation for CB texts.Interestingly, the mean number of types was the same forboth groups (69), although there was greater variation inthe PB scripts (SD 11.9) (see Table 1). This would suggestthat c<strong>and</strong>idates were using a similar range of vocabulary.However the PB c<strong>and</strong>idates used more words; this wouldimply that in the PB test c<strong>and</strong>idates were recycling somewords rather than using the ‘extra words’ to add morebreadth to the composition. The st<strong>and</strong>ardised type-tokenratio confirms this; the mean ratio for the CB group is higher(79 compared with 77). A t-test revealed a significantdifference in the mean st<strong>and</strong>ardised type-token ratiobetween the two groups (t=2.4, df=165, p=0.019). Thus©UCLES 2008 – The contents of this publication may not be reproduced without the written permission of the copyright holder.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!