26.12.2013 Views

A computational grammar and lexicon for Maltese

A computational grammar and lexicon for Maltese

A computational grammar and lexicon for Maltese

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

A <strong>computational</strong> <strong>lexicon</strong><br />

As an initial step to the creation of a <strong>lexicon</strong>, a plat<strong>for</strong>m <strong>for</strong> the collection of lexical resources<br />

in <strong>Maltese</strong> will be created. This collection will be hosted online <strong>and</strong> searchable in a single way,<br />

<strong>and</strong> all data should be extractable <strong>and</strong> easily convertible into other <strong>for</strong>mats. It will initially be<br />

populated with data from the resources listed in section 1.2.5.<br />

Choosing a highly flexible storage representation will allow new resources to be easily<br />

added to the collection as they become available. In this way we hope to enable to organic<br />

growth of a <strong>computational</strong> <strong>lexicon</strong> <strong>for</strong> <strong>Maltese</strong>, <strong>and</strong> thus avoid some of the startup problems<br />

encountered in previous attempts at building one (see section 1.2.3).<br />

By then combining this collection of lexical resources with the morphological generation<br />

from the resource <strong>grammar</strong>, we will extend this collection of resources into a full-<strong>for</strong>m <strong>lexicon</strong>.<br />

Such <strong>lexicon</strong>s are useful <strong>for</strong> spell checking <strong>and</strong> lemmatisation, particularly in morphologicallyrich<br />

languages such as <strong>Maltese</strong> where the concept of automatically-derivable word stem is not<br />

so prominent. Including all inflectional <strong>for</strong>ms in a <strong>lexicon</strong> will increase the number of word<br />

<strong>for</strong>ms by a few orders of magnitude (in the case of <strong>Maltese</strong>). This work is very repetitive but<br />

also not viably done manually, making it an ideal c<strong>and</strong>idate <strong>for</strong> rule-based production by a<br />

<strong>computational</strong> <strong>grammar</strong>. This full-<strong>for</strong>m <strong>lexicon</strong> will take two <strong>for</strong>ms:<br />

1. A searchable online database which stores all word <strong>for</strong>ms (both generated <strong>and</strong> manually<br />

imported from other sources).<br />

2. A monolingual GF dictionary module (DictMlt.gf), which uses smart paradigms to keep<br />

the <strong>lexicon</strong> as compact as possible.<br />

1.4.3 Organisation<br />

The remainder of this monograph is organised as follows. Chapter 2 covers the implementation<br />

of the <strong>Maltese</strong> resource <strong>grammar</strong>, including the testing carried out during development.<br />

Chapter 3 then covers the design of the plat<strong>for</strong>m <strong>for</strong> the collection of lexical resources, using<br />

the resource <strong>grammar</strong> to generate full inflection <strong>for</strong>ms <strong>and</strong> the creation of a monolingual dictionary<br />

module. Finally in chapter 4 we make some conclusions about the value of this thesis<br />

in the context of other related works <strong>and</strong> discuss some directions <strong>for</strong> future work.<br />

The appendices contain a description of the RGL API <strong>and</strong> the lexical paradigms available<br />

in the <strong>Maltese</strong> implementation, analyses of certain linguistic phenomena relevant to the design<br />

of the resource <strong>grammar</strong>, <strong>and</strong> in<strong>for</strong>mation about licensing <strong>and</strong> obtaining the source code <strong>for</strong> all<br />

the work described in this thesis.<br />

17

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!