A computational grammar and lexicon for Maltese
A computational grammar and lexicon for Maltese
A computational grammar and lexicon for Maltese
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
A <strong>computational</strong> <strong>lexicon</strong><br />
As an initial step to the creation of a <strong>lexicon</strong>, a plat<strong>for</strong>m <strong>for</strong> the collection of lexical resources<br />
in <strong>Maltese</strong> will be created. This collection will be hosted online <strong>and</strong> searchable in a single way,<br />
<strong>and</strong> all data should be extractable <strong>and</strong> easily convertible into other <strong>for</strong>mats. It will initially be<br />
populated with data from the resources listed in section 1.2.5.<br />
Choosing a highly flexible storage representation will allow new resources to be easily<br />
added to the collection as they become available. In this way we hope to enable to organic<br />
growth of a <strong>computational</strong> <strong>lexicon</strong> <strong>for</strong> <strong>Maltese</strong>, <strong>and</strong> thus avoid some of the startup problems<br />
encountered in previous attempts at building one (see section 1.2.3).<br />
By then combining this collection of lexical resources with the morphological generation<br />
from the resource <strong>grammar</strong>, we will extend this collection of resources into a full-<strong>for</strong>m <strong>lexicon</strong>.<br />
Such <strong>lexicon</strong>s are useful <strong>for</strong> spell checking <strong>and</strong> lemmatisation, particularly in morphologicallyrich<br />
languages such as <strong>Maltese</strong> where the concept of automatically-derivable word stem is not<br />
so prominent. Including all inflectional <strong>for</strong>ms in a <strong>lexicon</strong> will increase the number of word<br />
<strong>for</strong>ms by a few orders of magnitude (in the case of <strong>Maltese</strong>). This work is very repetitive but<br />
also not viably done manually, making it an ideal c<strong>and</strong>idate <strong>for</strong> rule-based production by a<br />
<strong>computational</strong> <strong>grammar</strong>. This full-<strong>for</strong>m <strong>lexicon</strong> will take two <strong>for</strong>ms:<br />
1. A searchable online database which stores all word <strong>for</strong>ms (both generated <strong>and</strong> manually<br />
imported from other sources).<br />
2. A monolingual GF dictionary module (DictMlt.gf), which uses smart paradigms to keep<br />
the <strong>lexicon</strong> as compact as possible.<br />
1.4.3 Organisation<br />
The remainder of this monograph is organised as follows. Chapter 2 covers the implementation<br />
of the <strong>Maltese</strong> resource <strong>grammar</strong>, including the testing carried out during development.<br />
Chapter 3 then covers the design of the plat<strong>for</strong>m <strong>for</strong> the collection of lexical resources, using<br />
the resource <strong>grammar</strong> to generate full inflection <strong>for</strong>ms <strong>and</strong> the creation of a monolingual dictionary<br />
module. Finally in chapter 4 we make some conclusions about the value of this thesis<br />
in the context of other related works <strong>and</strong> discuss some directions <strong>for</strong> future work.<br />
The appendices contain a description of the RGL API <strong>and</strong> the lexical paradigms available<br />
in the <strong>Maltese</strong> implementation, analyses of certain linguistic phenomena relevant to the design<br />
of the resource <strong>grammar</strong>, <strong>and</strong> in<strong>for</strong>mation about licensing <strong>and</strong> obtaining the source code <strong>for</strong> all<br />
the work described in this thesis.<br />
17