25.08.2013 Views

PDF (Online Text) - EURAC

PDF (Online Text) - EURAC

PDF (Online Text) - EURAC

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Parallel corpora of Orwell’s 1984 annotated in CES with morpho-syntactic<br />

information in ten Middle and Eastern European languages. (http://nl.ijs.si/<br />

ME/V2/)<br />

• Analysis:<br />

Delphin (lgs. > 5); HPSG-grammars:<br />

HPSG-Grammars for NLP-applications, in addition various tools for running and<br />

developing HPSG resources. (http://www.delph-in.net/)<br />

AGFL (lgs.> 4); parser and grammars:<br />

A description of Natural Languages with context-free grammars. (http://www.<br />

cs.ru.nl/agfl/)<br />

• Generation:<br />

KPML (lgs.> 10); text generation system:<br />

Systemic-functional grammars for natural language generation.<br />

(http://purl.org/net/kpml)<br />

• Machine Translation:<br />

OpenLogos (lgs. > 4); Machine Translation software and data:<br />

An open Source version of the Logos Machine Translation System for new<br />

language pairs to be added. (http://logos-os.dfki.de/).<br />

3. Strategies and Recommendations for Developers<br />

If there is no pool of free software data that matches your own data, you should<br />

try the following: 1) Convert your data into free software so that you have a greater<br />

chance that others will copy and take care of it; and, 2) Modify your data so that it<br />

can be pooled with other data. This might imply only a minor change in the format of<br />

the data that can be done automatically by a script. Alternatively, create a community<br />

that will, in the longterm, create a pool. In general, this implies that you separate the<br />

procedural components (tagger, spelling checker, parser, etc.) from the static linguistic<br />

data; make the procedural components freely available; and, describe the format of<br />

the static linguistic data. An example might well be Kevin Scannell’s CRUBADAN, a<br />

web-crawler for the construction of word lists for ISPELL. The author succeeded in<br />

creating a community around his tool that develops spellcheckers for more than thirty<br />

Small Languages (cf. http://borel.slu.edu/crubadan). Through this split of declarative<br />

(linguistic) components on the one hand, and procedural components (programs) on<br />

the other, many pools come with adequate tools to create and maintain the data.<br />

37

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!