26.12.2013 Views

A computational grammar and lexicon for Maltese

A computational grammar and lexicon for Maltese

A computational grammar and lexicon for Maltese

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

where the data structure does not in any way match between one source <strong>and</strong> the next. Given<br />

a fixed list of sources, it is not difficult to design a relational schema which accommodates the<br />

data from each; in the worst case this typically results in a large table with lots of null fields.<br />

But this solution will be problematic if one wants to include data from new resources that may<br />

become available in the future.<br />

In order to maximise adaptability to future resources, this collection of resources must be<br />

flexible enough to support heterogeneous data as it is. This will not only avoid the problem<br />

of having to design an inefficient schema which is the union of the structure of all the current<br />

resources, but more importantly will ensure that future data can be easily added to the collection<br />

without having to con<strong>for</strong>m to any specific schema. More in<strong>for</strong>mation about how this is<br />

achieved can be found below in section 3.2.1.<br />

3.1.3 Full-<strong>for</strong>ms<br />

Traditional dictionaries are organised by head word or lemma, where different word <strong>for</strong>ms <strong>for</strong><br />

that lemma are specified merely as suffixes. Take the following entry from the Serracino-Inglott<br />

(2003, p. 218) dictionary as an example:<br />

ħtieġa n.f.s., pl. -t, -ijiet ...<br />

The entry gives the singular <strong>for</strong>m ħtieġa (‘need’) as a head word, together with the suffixes<br />

which give the plural <strong>for</strong>ms. These affixes however cannot be blindly appended to the head<br />

word; the correct plural <strong>for</strong>ms in this case are in fact ħtiġiet <strong>and</strong> ħtiġijiet. So even though the<br />

dictionary gives us some in<strong>for</strong>mation about the other word <strong>for</strong>ms <strong>for</strong> this lemma, it takes some<br />

further knowledge of the language in order to apply the rules correctly.<br />

This is where the importance of having a full-<strong>for</strong>m <strong>lexicon</strong> becomes apparent; even in cases<br />

where inflection is affix-based, other morpho-phonological rules can come into play. In cases<br />

where inflection is non-concatenative — which is often the case in <strong>Maltese</strong> — storing all inflected<br />

<strong>for</strong>ms is essential if the <strong>lexicon</strong> is to be used <strong>for</strong> any kind of lookup or lemmatisation.<br />

Storing all <strong>for</strong>ms versus generating them<br />

Two options exist when it comes to building a full-<strong>for</strong>m <strong>lexicon</strong>:<br />

1. All word <strong>for</strong>ms are stored as individual entries in a database, all linked to the parent head<br />

word.<br />

2. Only the headword itself is stored, <strong>and</strong> inflected word <strong>for</strong>ms are produced in real-time<br />

by some automaton.<br />

The <strong>for</strong>mer option is generally more dem<strong>and</strong>ing in terms of space requirements, yet the<br />

latter option depends on the morphological predictability of the language in question. In the<br />

case of <strong>Maltese</strong>, the prevalence of unpredictable broken plurals <strong>for</strong> nouns <strong>and</strong> adjectives is a<br />

clear indication that some way of storing full <strong>for</strong>ms is needed, even if other plurals may be<br />

48

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!