02.05.2014 Views

Proceedings - Österreichische Gesellschaft für Artificial Intelligence

Proceedings - Österreichische Gesellschaft für Artificial Intelligence

Proceedings - Österreichische Gesellschaft für Artificial Intelligence

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Preface<br />

The need for automatically generated and processed linguistic resources has increased dramatically in<br />

recent years, due to the fact that linguistic resources are used in a wide variety of applications,<br />

including machine translation, information retrieval and extraction, sentiment analysis, lexicon<br />

creation, linguistic applications of machine learning, and so on.<br />

Building linguistic resources is generally expensive, time-consuming, and requires highly specialized<br />

skills. It is important, therefore, for these resources to be reusable and inter-operable, for the benefit<br />

of the commercial and research communities. This would allow the same resources to be applied in<br />

different applications and possibly in different research areas. To achieve this, one fundamental<br />

problem is the issue of standardization of linguistic resources.<br />

Language resource standards define, for example, how linguistic data can be created, imported and<br />

integrated in a platform-independent way. The relevant standardization activities are currently<br />

conducted by ISO/TC 37/SC 4, an international working group (‘sub-committee’) of the International<br />

Organization for Standardization (ISO).<br />

The objective of subcommittee ISO/TC 37/SC 4 is to develop international standards and guidelines<br />

for effective language resource management in mono- and multilingual applications. It also develops<br />

principles and methods for representations and annotations of data, for the creation of categories for<br />

thesauri, ontologies, morphological and syntactic analysis and so on.<br />

Subcommittee ISO/TC 37/SC 4 has so far published standards such as LAF (Linguistic Annotation<br />

Framework, ISO 24612:2012), SynLAF (Syntactic Annotation Framework, 24615:2010), LMF<br />

(Lexical Markup Framework, ISO 24613:2008). These standards define metamodels for data<br />

representation or the termsterminologies for data description and specify general requirements for<br />

linguistic resources. Other standards such as MAF (Morpho-syntactic Annotation Framework,<br />

ISO/FDIS 24611) and MLIF (Multilingual Information Framework, ISO 16642:2003) are still under<br />

development. Although a lot of work has been accomplished in over the past few years, there are<br />

still many goals to accomplish and much work to be done.<br />

The goal of this workshop is to present the current status of ISO standards in the domain of language<br />

resources and language technology, and to discuss current and future applications of these standards.<br />

A number of academics and experts from industry will present their work in the field of<br />

standardization and application of linguistic resources.<br />

The workshop includes tutorials, research presentations, use-cases and reports that allow participants<br />

to get acquainted with the standards. The workshop organizers Andreas Witt and Ulrich Heid<br />

currently chair the German DIN group for language resource standards. Andreas Witt also convened<br />

the ISO working group Linguistic Annotation (ISO/TC 37/SC 4/WG 6).<br />

Andreas Witt, Ulrich Heid<br />

Workshop Organizers<br />

481

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!