Programme booklet (pdf)
Programme booklet (pdf)
Programme booklet (pdf)
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
PRESENTATION ABSTRACTS<br />
Abstract<br />
Search in the Lassy Small Corpus<br />
van Noord, Gertjan and de Kok, Daniel and van der Linde, Jelmer<br />
University of Groningen<br />
A few months ago, the STEVIN Lassy project yielded its most important results: Lassy<br />
Small - a corpus of 1 million words with syntactic annotations which have been<br />
manually verified and corrected, and Lassy Large - a corpus of 1.5 billion words with<br />
automatically assigned syntactic structures. Syntactic annotations include part-ofspeech<br />
tags, lemma and dependency annotations of the type developed earlier in CGN<br />
and D-Coi.<br />
In this presentation we focus on the Lassy Small corpus, and introduce a stand-alone<br />
portable tool called DACT which can be used to browse the syntactic annotations in an<br />
attractive graphical form, and to search for sentences according to a number of search<br />
criteria, which can be specified elegantly by means of search queries formulated in<br />
XPATH, the WWW standard query language for XML documents. We provide a number<br />
of linguistically relevant examples of such queries, and we review the criticism of Lai<br />
and Bird (2010) which they take as motivation to introduce LPATH, an extension of<br />
XPATH. We will argue that such an extension is not required if string positions are<br />
explicitly encoded as XML attributes, as is the case in Lassy Small.<br />
DACT is freely available for various platforms, including Mac OS and recent versions of<br />
Windows.<br />
Corresponding author: g.j.m.van.noord@rug.nl<br />
59