Colloquia - British Association for Applied Linguistics

More documents

Recommendations

Info

BAAL Conference 2004 37 th Annual BAAL Meeting Tomlin, R. S. (1995). Focal Attention, Voice, and Word Order: An Experimental Cross-Linguistic Study. In Downing, Pamela and Noonan, Michael (eds.), Word Order in Discourse, pp. 517-554. Amsterdam and Philadelphia: John Benjamins Publishing Co. The use of corpora in the study of collocation and semantic prosody, exemplified through corpus data from English and Mandarin Chinese Tony McEnery and Richard Xiao, Department of Linguistics and Modern English Language, Lancaster University A.McEnery@lancs.ac.uk This presentation illustrates the use of corpora for the investigation of central issues of semantics and lexicography. The paper studies the collocational behaviour of near synonymy and explores the links between collocation and semantic prosody. The importance of these concepts to language learning is well recognised. Yet while collocation and semantic prosody have recently attracted much interest from researchers studying the English language, there has been little work done on collocation and semantic prosody on languages other than English. Still less work has been undertaken contrasting the collocational behaviour of near synonymy and the association of collocation and semantic prosody in different languages. It was to explore issues such as these that the Lancaster Corpus of Mandarin Chinese was developed. Although relatively large in comparison to corpora of undescribed languages, the principles underlying the design of the corpus can serve as guidelines for field linguists aiming at conducting similar analyses. Using the Mandarin corpus, we were able to undertake a cross-linguistic analysis of pattern and meaning, drawing upon data from English and Chinese. Our main analytical findings are as follows: (1) corpus data is useful in that it shows the central tendency and typical attested use of a word; (2) semantic prosody is closely associated with collocation and text type; and (3) even typologically distinct languages may demonstrate striking similarities in collocational pattern and semantic prosody. We conclude the talk by addressing the more general question of how to create corpora for our as well as for similar research questions for languages for which only limited amounts of data are available. An overview of the design and encoding principles underlying the British National Corpus project and how thy can be adapted to field-based corpora Lou Burnard Oxford University Computing Services, University of Oxford lou.burnard@computing-services.oxford.ac.uk The British National Corpus is designed to represent as wide a range of modern British English as possible. The same goal should ideally underlie the creation of smaller, field-based corpora as well. Therefore, this paper introduces the design and encoding decisions underlying the British National Corpus with the aim of extrapolating as many design principles for the creation of smaller and fieldbased corpora as possible. The paper starts with an overview of the genres, both written and oral, represented in the corpus and gives recommendations on gathering a representative sample, balanced with respect to genres, age, region and social class of the speakers, for a much smaller corpus. The paper further introduces the encoding principles of the British National Corpus. The corpus is encoded according to the guidelines of the Text Encoding Initiative (TEI). Each text in the corpus is segmented into orthographic sentence units, within which each word is automatically assigned a word class (part of speech) code. Segmentation and word-classification was carried out automatically by the CLAWS stochastic part-of-speech tagger developed at the University of Lancaster. The classification scheme used for the corpus distinguishes some 65 parts of speech, which are described in an accompanying documentation. Suggestions are made to what extent the encoding principles can be adapted to corpora that will mainly be tagged by hand and to what degree it is possible to create similar automatic taggers for less-described languages. Finally, it is illustrated how such a general corpus following internationally agreed standards can be used to answer a variety of research questions, ranging from lexicography to literary studies. The possible contributions of field-based corpora to syntactic theory, corpus linguistics, sociolinguistics, and anthropological and applied linguistics Friederike Luepke Endangered Languages Academic Programme, SOAS fl2@soas.ac.uk Corpus studies are in the overwhelming majority of cases limited to the investigation of languages for which large corpora are available. Field-based investigations, especially when presenting first descriptions of so far undocumented languages, generally rely on qualitative analyses only. King‟s College, London 9 – 11 th - 36 - September, 2004
BAAL Conference 2004 37 th Annual BAAL Meeting Nevertheless, even small field-based corpora can contribute essentially to different subdisciplines of linguistics. This presentation introduces a field-based corpus study based on a small tagged corpus of natural speech from Jalonke, a West African Mande language of the Niger-Congo phylum spoken in Guinea. The corpus study serves to illustrate the importance of quantitative data for the following domains: � Theories of argument structure and argument realisation. There is a considerable debate whether the number of arguments with which a given verb appears is determined at the lexical or the constructional level. The Jalonke data show that this question cannot be answered universally and based on qualitative inspections, as is generally the case, but that language-particular features and quantitative data need to be taken into account. � The corroboration of language-particular genres established on the basis of culture-specifically recognised speech events, and the comparability of the linguistic features of these genres, such as the frequency of passives, imperatives, etc., across languages. � The identification and differentiation of sociolects according to age, sex, social background, etc., and a differentiated quantification of code switching. � The recognition of genres and registers featuring formulaic and ritualistic speech. � The identification of those genres and registers most suitable as an input for language planning, standardisation, the development of teaching materials, etc. The paper ends with an appeal to corpus linguists and psycholinguists to collaborate in the adaptation of quantitative methods to field-based corpora. Corpora are gaining more and more importance in the new field of language documentation. However, field linguists alone cannot develop adequate standards and techniques adding to their „traditional‟ workload – they urgently need to co-operate with specialists interested in the specific problems posed by smaller field-based corpora. The use of elicitation games in corpus studies Sonja Eisenbeiss Department of Language and Linguistics, University of Essex seisen@essex.ac.uk Ayumi Matsuo Max Planck Institute for Psycholinguistics, Nijmegen ayumi.matsuo@mpi.nl Many field workers aim for natural communicative settings and thus simply observe and record communicative events or involve their participants in staged communicative events (e.g. discussions about a given topic). However, many constructions or grammatical markers are so infrequent that a detailed distributional analysis is impossible. This problem is even more pronounced for languages with argument ellipsis, where many verbs are hardly ever used with the full set of possible arguments (e.g. Japanese). Moreover, in spontaneous speech samples, it is often hard to determine intended meanings or referents. Therefore, many researchers make use of "classical" linguistic elicitation or experimentation. However, this data exhibits experimental effects and can often not be analysed for phenomena not targeted in the respective study. We argue for the use of so-called elicitation games, which involve comparatively natural communicative events that favour - but are not limited to - the use of particular constructions or markers. The effects of argument ellipsis for instance are reduced by providing discourse contexts in which speakers have to explicitly refer to all participants of the event they are talking about. We have developed a puzzle game, in which children describe pictures on a puzzle board in order to get the puzzle pieces with the matching pictures. The overt realisation of verb arguments is encouraged by using the same action but different participants for all pictures on the board (e.g. a man giving honey to a bear vs. a man giving honey to a cat vs. a woman giving a mouse to a bear). We present child language data from a comparative study on argument structure and case marking in German and Japanese. Based on this data set, we show that the use of elicitation games provides a quantitatively rich data base that can be exploited for a variety of purposes and provides a sufficient, ecologically valid foundation for quantitative and distributional analyses. In addition, we discuss the use of elicitation games for different types of constructions and grammatical markers in crosslinguistic/cultural studies. Three reasons to record speech-accompanying gestures when documenting an endangered language Sotaro Kita Department of Experimental Psychology, University of Bristol Sotaro.kita@bristol.ac.uk King‟s College, London 9 – 11 th - 37 - September, 2004
Page 1 and 2: BAAL Conference 2004 37 th Annual B
Page 35: BAAL Conference 2004 37 th Annual B
Page 85: BAAL Conference 2004 37 th Annual B

Colloquia - British Association for Applied Linguistics

Create successful ePaper yourself

Delete template?

Save as template?