02.05.2014 Views

Proceedings - Österreichische Gesellschaft für Artificial Intelligence

Proceedings - Österreichische Gesellschaft für Artificial Intelligence

Proceedings - Österreichische Gesellschaft für Artificial Intelligence

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Preface<br />

The interaction between Human Language Technology (HLT) and Digital Humanities (DH) at large<br />

has been of interest in various projects and initiatives during the last years, aiming to bring forward<br />

language resources and tools for the Humanities, Social Sciences and Cultural Heritage.<br />

The specific focus of LThist 2012 lies on the development of technology and resources required for<br />

processing historical texts. Workshop contributors and participants discuss ways and strategies for<br />

shaping HLT resources (tools, data and metadata) in ways that are maximally beneficial for<br />

researchers in the Humanities. The necessity for a strong interplay between proponents from<br />

language technology and from the Humanities is also reflected in the invited talks. While Caroline<br />

Sporleder takes a language technology perspective, Sonia Horn addresses the needs and requirements<br />

from a medical historian's point of view. A major aspect of the workshop is the exchange of<br />

experiences with and comparison of tools, approaches, and standards that make historical texts<br />

accessible to automatic processing. Moreover, LThist encourages the interchange of historical data<br />

and processing tools.<br />

In the present workshop, historical texts are understood in two ways: i) texts as documents of older<br />

forms of languages, and ii) texts as documentations of historical content. Accordingly, the<br />

contributions comprise a broad range of topics, genres and diachronic language varieties, including<br />

scientific prose, narratives, folk tales, riddles etc., as well as trade-related documents and marriage<br />

license books with the latter being are valuable resource for demography studies. The presented<br />

papers address various aspects of data preparation and (semi-)automatic processing for a number of<br />

languages including Old Swedish, Late Middle English, Middle English, Early Modern English and<br />

Modern English, diachronic varieties of German, Dutch and Spanish, and Old Occitan. The proposed<br />

approaches and technical solutions center around problem areas such as improving the OCR quality<br />

of historical texts, orthography harmonization and mapping historical to modern word forms, as<br />

prerequisites for automatic mining of historical texts. Also, the possibilities of cross-language<br />

transfer of morphosyntactic and syntactic annotation from resource-rich source languages to underresourced<br />

target languages are examined. Technical infrastructures, specifically tailored for historical<br />

corpora, are discussed, including mark-up languages for historical texts and representation formats<br />

for diachronic lexical databases, processing tools and architectures.<br />

Overall, LThist 2012 well reflects the current discussions regarding automatic processing of<br />

historical texts where OCR errors and the lack of harmonization in orthography are still major<br />

practical issues, but where also machine learning and cross-language transfer are coming more and<br />

more into focus.<br />

Thierry Declerck, Brigitte Krenn and Karlheinz Mörth<br />

Workshop Organizers<br />

Saarbrücken & Vienna, September 2012<br />

328

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!