12.09.2013 Views

Programme booklet (pdf)

Programme booklet (pdf)

Programme booklet (pdf)

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

PRESENTATION ABSTRACTS<br />

Abstract<br />

Search in the Lassy Small Corpus<br />

van Noord, Gertjan and de Kok, Daniel and van der Linde, Jelmer<br />

University of Groningen<br />

A few months ago, the STEVIN Lassy project yielded its most important results: Lassy<br />

Small - a corpus of 1 million words with syntactic annotations which have been<br />

manually verified and corrected, and Lassy Large - a corpus of 1.5 billion words with<br />

automatically assigned syntactic structures. Syntactic annotations include part-ofspeech<br />

tags, lemma and dependency annotations of the type developed earlier in CGN<br />

and D-Coi.<br />

In this presentation we focus on the Lassy Small corpus, and introduce a stand-alone<br />

portable tool called DACT which can be used to browse the syntactic annotations in an<br />

attractive graphical form, and to search for sentences according to a number of search<br />

criteria, which can be specified elegantly by means of search queries formulated in<br />

XPATH, the WWW standard query language for XML documents. We provide a number<br />

of linguistically relevant examples of such queries, and we review the criticism of Lai<br />

and Bird (2010) which they take as motivation to introduce LPATH, an extension of<br />

XPATH. We will argue that such an extension is not required if string positions are<br />

explicitly encoded as XML attributes, as is the case in Lassy Small.<br />

DACT is freely available for various platforms, including Mac OS and recent versions of<br />

Windows.<br />

Corresponding author: g.j.m.van.noord@rug.nl<br />

59

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!