Colloquia - British Association for Applied Linguistics
Colloquia - British Association for Applied Linguistics
Colloquia - British Association for Applied Linguistics
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
BAAL Conference 2004 37 th Annual BAAL Meeting<br />
Tomlin, R. S. (1995). Focal Attention, Voice, and Word Order: An Experimental Cross-Linguistic Study.<br />
In Downing, Pamela and Noonan, Michael (eds.), Word Order in Discourse, pp. 517-554. Amsterdam<br />
and Philadelphia: John Benjamins Publishing Co.<br />
The use of corpora in the study of collocation and semantic prosody, exemplified through<br />
corpus data from English and Mandarin Chinese<br />
Tony McEnery and Richard Xiao,<br />
Department of <strong>Linguistics</strong> and Modern English Language, Lancaster University<br />
A.McEnery@lancs.ac.uk<br />
This presentation illustrates the use of corpora <strong>for</strong> the investigation of central issues of semantics and<br />
lexicography. The paper studies the collocational behaviour of near synonymy and explores the links<br />
between collocation and semantic prosody. The importance of these concepts to language learning is<br />
well recognised. Yet while collocation and semantic prosody have recently attracted much interest<br />
from researchers studying the English language, there has been little work done on collocation and<br />
semantic prosody on languages other than English. Still less work has been undertaken contrasting<br />
the collocational behaviour of near synonymy and the association of collocation and semantic prosody<br />
in different languages. It was to explore issues such as these that the Lancaster Corpus of Mandarin<br />
Chinese was developed. Although relatively large in comparison to corpora of undescribed languages,<br />
the principles underlying the design of the corpus can serve as guidelines <strong>for</strong> field linguists aiming at<br />
conducting similar analyses. Using the Mandarin corpus, we were able to undertake a cross-linguistic<br />
analysis of pattern and meaning, drawing upon data from English and Chinese. Our main analytical<br />
findings are as follows: (1) corpus data is useful in that it shows the central tendency and typical<br />
attested use of a word; (2) semantic prosody is closely associated with collocation and text type; and<br />
(3) even typologically distinct languages may demonstrate striking similarities in collocational pattern<br />
and semantic prosody. We conclude the talk by addressing the more general question of how to<br />
create corpora <strong>for</strong> our as well as <strong>for</strong> similar research questions <strong>for</strong> languages <strong>for</strong> which only limited<br />
amounts of data are available.<br />
An overview of the design and encoding principles underlying the <strong>British</strong> National Corpus<br />
project and how thy can be adapted to field-based corpora<br />
Lou Burnard<br />
Ox<strong>for</strong>d University Computing Services, University of Ox<strong>for</strong>d<br />
lou.burnard@computing-services.ox<strong>for</strong>d.ac.uk<br />
The <strong>British</strong> National Corpus is designed to represent as wide a range of modern <strong>British</strong> English as<br />
possible. The same goal should ideally underlie the creation of smaller, field-based corpora as well.<br />
There<strong>for</strong>e, this paper introduces the design and encoding decisions underlying the <strong>British</strong> National<br />
Corpus with the aim of extrapolating as many design principles <strong>for</strong> the creation of smaller and fieldbased<br />
corpora as possible. The paper starts with an overview of the genres, both written and oral,<br />
represented in the corpus and gives recommendations on gathering a representative sample,<br />
balanced with respect to genres, age, region and social class of the speakers, <strong>for</strong> a much smaller<br />
corpus. The paper further introduces the encoding principles of the <strong>British</strong> National Corpus. The<br />
corpus is encoded according to the guidelines of the Text Encoding Initiative (TEI). Each text in the<br />
corpus is segmented into orthographic sentence units, within which each word is automatically<br />
assigned a word class (part of speech) code. Segmentation and word-classification was carried out<br />
automatically by the CLAWS stochastic part-of-speech tagger developed at the University of<br />
Lancaster. The classification scheme used <strong>for</strong> the corpus distinguishes some 65 parts of speech,<br />
which are described in an accompanying documentation. Suggestions are made to what extent the<br />
encoding principles can be adapted to corpora that will mainly be tagged by hand and to what degree<br />
it is possible to create similar automatic taggers <strong>for</strong> less-described languages. Finally, it is illustrated<br />
how such a general corpus following internationally agreed standards can be used to answer a variety<br />
of research questions, ranging from lexicography to literary studies.<br />
The possible contributions of field-based corpora to syntactic theory, corpus linguistics,<br />
sociolinguistics, and anthropological and applied linguistics<br />
Friederike Luepke<br />
Endangered Languages Academic Programme, SOAS<br />
fl2@soas.ac.uk<br />
Corpus studies are in the overwhelming majority of cases limited to the investigation of languages <strong>for</strong><br />
which large corpora are available. Field-based investigations, especially when presenting first<br />
descriptions of so far undocumented languages, generally rely on qualitative analyses only.<br />
King‟s College, London 9 – 11 th - 36 -<br />
September, 2004