25.10.2012 Views

Laurie Bauer - WordPress.com — Get a Free Blog Here

Laurie Bauer - WordPress.com — Get a Free Blog Here

Laurie Bauer - WordPress.com — Get a Free Blog Here

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

THE LINGUISTICS STUDENT’S HANDBOOK 86<br />

which can be recorded but which involve rather less unnatural production of<br />

speech than is provided in reading tasks: role playing, doing a ‘map task’ (where<br />

two participants are given partial maps showing some of the same features and<br />

are asked to reconstruct some route across the map, without seeing each other’s<br />

maps), and similar exercises. The difficulty here is often to create a meaningful<br />

task which will produce the requisite language behaviour, but the data produced<br />

can be very valuable.<br />

Electronic corpora<br />

While it is possible to create one’s own electronic corpus and to annotate it in<br />

any way desired, it will be assumed in what follows that the use of electronic<br />

corpora involves the analysis of one of the standard corpora now more and<br />

more readily available. The great benefit of corpora is that somebody has<br />

already collected a number of texts, probably with some attempt at representativeness,<br />

and has already done the transcription in the case of spoken texts.<br />

Electronic corpora thus provide some of the best features of literary and nonliterary<br />

texts and sound recordings, with the added advantage that they are relatively<br />

easy to search (or, in most cases, are easy to search as long as the search<br />

can be carried out in terms of specific lexical material). Some corpora have also<br />

been tagged, i.e. marked with information about the word classes of the items<br />

in the texts. The best corpora of spoken language can now link the transcriptions<br />

direct to the sound files and to the files containing speaker information,<br />

so that it is possible to search for an occurrence of /e/ before /l/ spoken by a<br />

woman. Where part-of-speech tagging has been manually checked it is more<br />

useful than when it has been done entirely automatically, in which case the<br />

analyst has to be aware that there may be errors in the labels assigned. A very<br />

few corpora have also had the syntactic constructions in them analysed and<br />

marked. As well as those collections of texts put together specifically for the use<br />

of linguists, there is a growing number of electronically searchable bodies of<br />

text, which may also be of value. Many newspapers are now republished retrospectively<br />

on CD, and there are collections of literary and non-literary texts<br />

from various periods. The largest body of electronically searchable text is provided<br />

by the world-wide web. Ironically, given that one of the problems with<br />

newspapers as sources of data is that there may be editorial interference, one of<br />

the major problems with the web is that there is no editorial control, and that<br />

spelling mistakes and syntactic errors abound. For example, a query on few<br />

person will not only turn up reference to a few person-hours and the like, but will<br />

also provide examples such as Is it possible that a few person participate using the<br />

same <strong>com</strong>puter? Also, a search of the web may turn up several occurrences of the<br />

same document, and thus apparently inflate the occurrence of a particular<br />

structure. Despite such problems, the web is an invaluable source of data on

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!