26.12.2013 Views

A computational grammar and lexicon for Maltese

A computational grammar and lexicon for Maltese

A computational grammar and lexicon for Maltese

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Abstract<br />

A Computational Grammar <strong>and</strong> Lexicon <strong>for</strong> <strong>Maltese</strong><br />

JOHN J. CAMILLERI<br />

Department of Computer Science <strong>and</strong> Engineering<br />

Chalmers University of Technology <strong>and</strong> University of Gothenburg<br />

<strong>Maltese</strong> is the national language of Malta <strong>and</strong> an official language of the European Union.<br />

While classified as Semitic, <strong>Maltese</strong> has been heavily influenced by the Romance languages<br />

<strong>and</strong> English, <strong>and</strong> features both root-<strong>and</strong>-pattern <strong>and</strong> concatenative morphologies. Despite its<br />

active use, the language is highly under-resourced in digital terms. This thesis contributes two<br />

<strong>computational</strong> resources <strong>for</strong> <strong>Maltese</strong>: a <strong>grammar</strong> <strong>and</strong> an online full-<strong>for</strong>m <strong>lexicon</strong>.<br />

The first part of this thesis deals with a <strong>computational</strong> <strong>grammar</strong> <strong>for</strong> <strong>Maltese</strong>, which is implemented<br />

using the Grammatical Framework (GF). GF is a multilingual <strong>grammar</strong> <strong>for</strong>malism<br />

based on using abstract syntax trees as language-independent semantic representations. Its<br />

Resource Grammar Library (RGL) already covers the morphology <strong>and</strong> basic syntax of some 27<br />

languages from around the world. <strong>Maltese</strong> is the 28 th addition to the RGL, <strong>and</strong> the first Semitic<br />

language in the library to be completed. The smart paradigms implemented in the morphological<br />

part of <strong>grammar</strong> allow full inflection tables to be produced <strong>for</strong> any lexical unit, often<br />

requiring only a lemmatised <strong>for</strong>m. This report looks at some of the more interesting implementational<br />

details of the <strong>grammar</strong>, discussing the compromises that had to be made along<br />

the way.<br />

The second part covers the collection of various <strong>Maltese</strong> lexical resources into a single<br />

searchable collection, using a schema-less database to accommodate partial data from heterogeneous<br />

sources. We then use the smart paradigms from the morphological part of the <strong>grammar</strong><br />

to automatically produce some 4 million inflection <strong>for</strong>ms <strong>and</strong> extend the collection into<br />

a full-<strong>for</strong>m <strong>computational</strong> <strong>lexicon</strong>, which can be used in <strong>for</strong> morphological lookup <strong>and</strong> spell<br />

checking.<br />

All the software <strong>and</strong> resources described in this thesis are open-source <strong>and</strong> free to use <strong>for</strong><br />

any purpose.<br />

Keywords: <strong>computational</strong>, <strong>grammar</strong>, syntax, morphology, <strong>lexicon</strong>, linguistics, <strong>Maltese</strong><br />

i

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!