A computational grammar and lexicon for Maltese
A computational grammar and lexicon for Maltese
A computational grammar and lexicon for Maltese
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
InterActive Terminology <strong>for</strong> Europe (IATE)<br />
The IATE 2 is the European Union’s multilingual term base used by various institutions <strong>for</strong> the<br />
collection, dissemination <strong>and</strong> shared management of EU-specific terminology. It was launched<br />
in 1999 as a web-based infrastructure <strong>for</strong> all EU terminology resources. IATE incorporates <strong>and</strong><br />
st<strong>and</strong>ardises all existing terminology databases of the EU’s translation services into a single<br />
database. It also includes a number of legacy databases, <strong>and</strong> now contains a total of approximately<br />
1.4 million multilingual entries or 8.4 million terms, covering all 23 languages of the EU<br />
(including <strong>Maltese</strong>).<br />
1.2.3 The Maltilex project<br />
There has long been an interest in creating a <strong>computational</strong> <strong>lexicon</strong> <strong>for</strong> <strong>Maltese</strong>, though to date<br />
no concrete system yet exists. The Maltilex project was first announced in 1998, identifying the<br />
need <strong>for</strong> such a <strong>lexicon</strong> <strong>and</strong> outlining the scope of the project <strong>for</strong> creating one (Rosner et al. ,<br />
1998). Rosner et al. (1999) go on to highlight the automatic extraction idea, involving tokenising<br />
<strong>and</strong> per<strong>for</strong>ming headword identification on a corpus in order to obtain a <strong>lexicon</strong>.<br />
In his M.Sc. thesis, Dalli (2002a) describes a concrete implementation of this, using a weaklysupervised<br />
learning approach. The work details at length the machine learning algorithms enabling<br />
extraction this, adapting clustering techniques from bio-in<strong>for</strong>matics. It also introduces<br />
the Lexicon Structuring Technique (LST), which attempts to identify lemmas in an unstructured<br />
list of words without requiring any prior rules. Sadly, the only <strong>Maltese</strong> corpus available<br />
at the time was very small (~2000 tokens) <strong>and</strong> the end results were very noisy <strong>and</strong> not practically<br />
usable. Despite the corpus size having today grown to 100 million words, it seems this<br />
experiment has not been carried out again with the new larger corpus.<br />
A later student project by Attard (2005) concentrates on the infrastructure required <strong>for</strong> implementing<br />
such a <strong>lexicon</strong> as a collection of services. Despite going into significant technical<br />
detail, the framework suffered from lack of flexibility <strong>and</strong> was not adopted in any lasting way<br />
by the project.<br />
As the Maltilex project evolved into the <strong>Maltese</strong> Language Resource Server (MLRS), a new<br />
description of a <strong>lexicon</strong> structure was presented in Rosner et al. (2006). This paper describes the<br />
use of an Object Description Language (ODL) <strong>for</strong> the specification of the attributes <strong>and</strong> values<br />
that make up each lexical class, effectively acting as a kind of type system <strong>for</strong> a set of key-value<br />
pairs. However it is not clear that such a model has actually been written in ODL <strong>for</strong> <strong>Maltese</strong>.<br />
The paper only briefly treats the implementation of the <strong>lexicon</strong> database itself, noting only how<br />
the relational model is not entirely suitable <strong>and</strong> that no satisfactory storage <strong>for</strong>mat had yet been<br />
decided upon.<br />
The most recent development in this road towards a <strong>computational</strong> <strong>lexicon</strong> is the announcement<br />
of a project to create a national online dictionary <strong>for</strong> <strong>Maltese</strong> 3 . Rather than focusing on<br />
extraction from a corpus, the project will instead be digitising Aquilina’s <strong>Maltese</strong>-English dic-<br />
2 http://iate.europa.eu/iatediff/brochure/IATEbrochure_MT.pdf, accessed 2013-08-21<br />
3 National Council <strong>for</strong> the <strong>Maltese</strong> Language. http://kunsilltalmalti.gov.mt/projects, accessed 2013-06-24<br />
7