01.12.2012 Views

Colloquia - British Association for Applied Linguistics

Colloquia - British Association for Applied Linguistics

Colloquia - British Association for Applied Linguistics

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

BAAL Conference 2004 37 th Annual BAAL Meeting<br />

Tomlin, R. S. (1995). Focal Attention, Voice, and Word Order: An Experimental Cross-Linguistic Study.<br />

In Downing, Pamela and Noonan, Michael (eds.), Word Order in Discourse, pp. 517-554. Amsterdam<br />

and Philadelphia: John Benjamins Publishing Co.<br />

The use of corpora in the study of collocation and semantic prosody, exemplified through<br />

corpus data from English and Mandarin Chinese<br />

Tony McEnery and Richard Xiao,<br />

Department of <strong>Linguistics</strong> and Modern English Language, Lancaster University<br />

A.McEnery@lancs.ac.uk<br />

This presentation illustrates the use of corpora <strong>for</strong> the investigation of central issues of semantics and<br />

lexicography. The paper studies the collocational behaviour of near synonymy and explores the links<br />

between collocation and semantic prosody. The importance of these concepts to language learning is<br />

well recognised. Yet while collocation and semantic prosody have recently attracted much interest<br />

from researchers studying the English language, there has been little work done on collocation and<br />

semantic prosody on languages other than English. Still less work has been undertaken contrasting<br />

the collocational behaviour of near synonymy and the association of collocation and semantic prosody<br />

in different languages. It was to explore issues such as these that the Lancaster Corpus of Mandarin<br />

Chinese was developed. Although relatively large in comparison to corpora of undescribed languages,<br />

the principles underlying the design of the corpus can serve as guidelines <strong>for</strong> field linguists aiming at<br />

conducting similar analyses. Using the Mandarin corpus, we were able to undertake a cross-linguistic<br />

analysis of pattern and meaning, drawing upon data from English and Chinese. Our main analytical<br />

findings are as follows: (1) corpus data is useful in that it shows the central tendency and typical<br />

attested use of a word; (2) semantic prosody is closely associated with collocation and text type; and<br />

(3) even typologically distinct languages may demonstrate striking similarities in collocational pattern<br />

and semantic prosody. We conclude the talk by addressing the more general question of how to<br />

create corpora <strong>for</strong> our as well as <strong>for</strong> similar research questions <strong>for</strong> languages <strong>for</strong> which only limited<br />

amounts of data are available.<br />

An overview of the design and encoding principles underlying the <strong>British</strong> National Corpus<br />

project and how thy can be adapted to field-based corpora<br />

Lou Burnard<br />

Ox<strong>for</strong>d University Computing Services, University of Ox<strong>for</strong>d<br />

lou.burnard@computing-services.ox<strong>for</strong>d.ac.uk<br />

The <strong>British</strong> National Corpus is designed to represent as wide a range of modern <strong>British</strong> English as<br />

possible. The same goal should ideally underlie the creation of smaller, field-based corpora as well.<br />

There<strong>for</strong>e, this paper introduces the design and encoding decisions underlying the <strong>British</strong> National<br />

Corpus with the aim of extrapolating as many design principles <strong>for</strong> the creation of smaller and fieldbased<br />

corpora as possible. The paper starts with an overview of the genres, both written and oral,<br />

represented in the corpus and gives recommendations on gathering a representative sample,<br />

balanced with respect to genres, age, region and social class of the speakers, <strong>for</strong> a much smaller<br />

corpus. The paper further introduces the encoding principles of the <strong>British</strong> National Corpus. The<br />

corpus is encoded according to the guidelines of the Text Encoding Initiative (TEI). Each text in the<br />

corpus is segmented into orthographic sentence units, within which each word is automatically<br />

assigned a word class (part of speech) code. Segmentation and word-classification was carried out<br />

automatically by the CLAWS stochastic part-of-speech tagger developed at the University of<br />

Lancaster. The classification scheme used <strong>for</strong> the corpus distinguishes some 65 parts of speech,<br />

which are described in an accompanying documentation. Suggestions are made to what extent the<br />

encoding principles can be adapted to corpora that will mainly be tagged by hand and to what degree<br />

it is possible to create similar automatic taggers <strong>for</strong> less-described languages. Finally, it is illustrated<br />

how such a general corpus following internationally agreed standards can be used to answer a variety<br />

of research questions, ranging from lexicography to literary studies.<br />

The possible contributions of field-based corpora to syntactic theory, corpus linguistics,<br />

sociolinguistics, and anthropological and applied linguistics<br />

Friederike Luepke<br />

Endangered Languages Academic Programme, SOAS<br />

fl2@soas.ac.uk<br />

Corpus studies are in the overwhelming majority of cases limited to the investigation of languages <strong>for</strong><br />

which large corpora are available. Field-based investigations, especially when presenting first<br />

descriptions of so far undocumented languages, generally rely on qualitative analyses only.<br />

King‟s College, London 9 – 11 th - 36 -<br />

September, 2004

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!