13.07.2015 Views

WWW/Internet - Portal do Software Público Brasileiro

WWW/Internet - Portal do Software Público Brasileiro

WWW/Internet - Portal do Software Público Brasileiro

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

ISBN: 978-972-8939-25-0 © 2010 IADISAlthough Apertium has technical advantages widely recognized and praised by the international scientificcommunity, and despite its structured and open knowledge base, its popularization has been facing a numberof obstacles. The development of language pairs for Apertium is perceived as complex, based on lengthyXML files without interface alternatives that allow for the development and maintenance of linguisticknowledge bases. Therefore, all efforts are based on textual tools.This fact is a hindrance to the evolution of the available language pairs precisely because the number ofusers able to develop knowledge for the tool is strictly limited to experts and/or trained users.2. OBJECTIVESWhen new users are interested in contributing to the development of a given language pair, they must gothrough a period of identification and learning about structures, command line tools and processes, whichmight be extremely complex for a layman in Computing. Indeed, this learning period may become ratherlong and complicated due to the lack of formalization and tools, thus discouraging countless potentialcollaborators. In this light, the main objective of the present work is to minimize this negative factor.Among the aspects that must be lightened and simplified within the interaction-interface environment, wewould like to mention the following: the marking of XML files, the compilation of XML bases in Apertium,as well as the organization, the acronyms and the internal structure of the knowledge base. Furthermore, withthe ever-growing knowledge about language pairs, the direct manipulation of these files has also becomeintricate from a human point of view.In the present paper, we present a formalization proposal of a part of the process of a language paircreation and development for translation, which we describe through interfaces and stages of knowledgespecification. This study has allowed for the dissemination of a set of interfaces which, in turn, make up amore adequate and efficient alternative to the current process. We named this set WiKLang, and its mainobjectives are the following: Standardize both the understanding and the process, and make them less dependent on textualscripts; Reduce to a minimum the Computing knowledge needed for a new collaborator to understand howto create and develop language pairs; Allow for the manipulation of the knowledge bases in a reliable, generic fashion, includingmanagement functions; Stimulate both translation improvement between existing language pairs and the development ofnew pairs, thus making MTs, such as Apertium, more reliable and with a bulkier knowledge base.3. METHODOLOGYIn order to achieve the previously mentioned objectives, we have followed the steps described below: Interaction with the Apertium team so as to understand the interface and its architecture; Identification of problems and aspects that could be improved in the interaction with Apertiumdeveloper-users; Formalization of the knowledge specification process used in Apertium; Creation of a project of an interaction-interface environment (WiKLang) using concepts similar toApertium’s, as a more adequate alternative to the linguistic knowledge specification needed to translatelanguage pairs; Development of a non-functional prototype to be assessed by the Apertium team.4. DATA DEVELOPMENT ON THE APERTIUM PROJECTApertium is an open-source platform for creating rule-based machine translation systems. It was initiallydesigned for closely-related languages, but has also been adapted to work better for less related languages.160

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!