26.12.2013 Views

A computational grammar and lexicon for Maltese

A computational grammar and lexicon for Maltese

A computational grammar and lexicon for Maltese

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

InterActive Terminology <strong>for</strong> Europe (IATE)<br />

The IATE 2 is the European Union’s multilingual term base used by various institutions <strong>for</strong> the<br />

collection, dissemination <strong>and</strong> shared management of EU-specific terminology. It was launched<br />

in 1999 as a web-based infrastructure <strong>for</strong> all EU terminology resources. IATE incorporates <strong>and</strong><br />

st<strong>and</strong>ardises all existing terminology databases of the EU’s translation services into a single<br />

database. It also includes a number of legacy databases, <strong>and</strong> now contains a total of approximately<br />

1.4 million multilingual entries or 8.4 million terms, covering all 23 languages of the EU<br />

(including <strong>Maltese</strong>).<br />

1.2.3 The Maltilex project<br />

There has long been an interest in creating a <strong>computational</strong> <strong>lexicon</strong> <strong>for</strong> <strong>Maltese</strong>, though to date<br />

no concrete system yet exists. The Maltilex project was first announced in 1998, identifying the<br />

need <strong>for</strong> such a <strong>lexicon</strong> <strong>and</strong> outlining the scope of the project <strong>for</strong> creating one (Rosner et al. ,<br />

1998). Rosner et al. (1999) go on to highlight the automatic extraction idea, involving tokenising<br />

<strong>and</strong> per<strong>for</strong>ming headword identification on a corpus in order to obtain a <strong>lexicon</strong>.<br />

In his M.Sc. thesis, Dalli (2002a) describes a concrete implementation of this, using a weaklysupervised<br />

learning approach. The work details at length the machine learning algorithms enabling<br />

extraction this, adapting clustering techniques from bio-in<strong>for</strong>matics. It also introduces<br />

the Lexicon Structuring Technique (LST), which attempts to identify lemmas in an unstructured<br />

list of words without requiring any prior rules. Sadly, the only <strong>Maltese</strong> corpus available<br />

at the time was very small (~2000 tokens) <strong>and</strong> the end results were very noisy <strong>and</strong> not practically<br />

usable. Despite the corpus size having today grown to 100 million words, it seems this<br />

experiment has not been carried out again with the new larger corpus.<br />

A later student project by Attard (2005) concentrates on the infrastructure required <strong>for</strong> implementing<br />

such a <strong>lexicon</strong> as a collection of services. Despite going into significant technical<br />

detail, the framework suffered from lack of flexibility <strong>and</strong> was not adopted in any lasting way<br />

by the project.<br />

As the Maltilex project evolved into the <strong>Maltese</strong> Language Resource Server (MLRS), a new<br />

description of a <strong>lexicon</strong> structure was presented in Rosner et al. (2006). This paper describes the<br />

use of an Object Description Language (ODL) <strong>for</strong> the specification of the attributes <strong>and</strong> values<br />

that make up each lexical class, effectively acting as a kind of type system <strong>for</strong> a set of key-value<br />

pairs. However it is not clear that such a model has actually been written in ODL <strong>for</strong> <strong>Maltese</strong>.<br />

The paper only briefly treats the implementation of the <strong>lexicon</strong> database itself, noting only how<br />

the relational model is not entirely suitable <strong>and</strong> that no satisfactory storage <strong>for</strong>mat had yet been<br />

decided upon.<br />

The most recent development in this road towards a <strong>computational</strong> <strong>lexicon</strong> is the announcement<br />

of a project to create a national online dictionary <strong>for</strong> <strong>Maltese</strong> 3 . Rather than focusing on<br />

extraction from a corpus, the project will instead be digitising Aquilina’s <strong>Maltese</strong>-English dic-<br />

2 http://iate.europa.eu/iatediff/brochure/IATEbrochure_MT.pdf, accessed 2013-08-21<br />

3 National Council <strong>for</strong> the <strong>Maltese</strong> Language. http://kunsilltalmalti.gov.mt/projects, accessed 2013-06-24<br />

7

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!