25.08.2013 Views

PDF (Online Text) - EURAC

PDF (Online Text) - EURAC

PDF (Online Text) - EURAC

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

dictionaries (cfr. Bortolotti & Rasom) meant to improve the language skills of native<br />

speakers.<br />

Several authors file contributions on tools for the elaboration and storage of language<br />

for text analysis and processing of text material with a view to the development of<br />

corpora. Puddu points out the importance of corpora for supporting the development<br />

of lesser used languages and the main problems connected to corpus design, text<br />

collection, storage and annotation for a lesser used language like Sardo (cf. Puddu).<br />

Sardinian, like any other lesser used language, has to cope with problems related to<br />

retrieval of written text, and in this specific case, also with a second problem: the<br />

absence of a standard orthography. The application of a homogeneous tag system, as<br />

well as the use of standards on storage, such as the rules elaborated by the EAGLES<br />

group (XCES), is suggested.<br />

Prinsloo and Heid describe methodology such as the bootstrapping of resources<br />

in order to elaborate language documentation and annotation. They describe the<br />

development of different tools to bootstrap tagging resources for Northern Sotho, and<br />

resources used to identify verbs and nouns for the disambiguation of closed class items.<br />

The Bantu languages and their characteristics are also discussed in the contribution by<br />

Taljard and Bosh, who present the problems encountered when dealing with languages<br />

with different writing systems — in this special case, Northern Sotho and Zulu. The<br />

authors describe the distinct approaches for class tagging according to the different<br />

writing systems.<br />

Examples of knowledge extraction and knowledge engineering are discussed in<br />

the paper on the FAME project, an Interlingual Speech-to-Speech Machine Translation<br />

System for Catalan, English and Spanish developed to assist users in making hotel<br />

reservations. The project includes tools for the documentation of data and elaboration<br />

of the standard Interchange Format (IF).<br />

It is clear from these contributions that nowadays, a variety of approaches and<br />

scientific methodologies are adopted in research on lesser used languages, showing<br />

the vitality of research in this specific area.<br />

Thanks to authors who cover a large variety of projects and technologies, an<br />

overview of the state of the art in research on lesser used languages can be provided,<br />

especially as regards projects on lesser used languages involving computational<br />

linguistics in Europe and the world. Central to the conference are both methodological<br />

issues, prompted by the described strategies for an efficient support of lesser used<br />

languages, and the problems encountered with theoretical approaches developed for<br />

major languages but applied to lesser used languages.<br />

9

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!