A computational grammar and lexicon for Maltese
A computational grammar and lexicon for Maltese
A computational grammar and lexicon for Maltese
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Table 3.1: Estimation of the number of word <strong>for</strong>ms required in a full-<strong>for</strong>m <strong>lexicon</strong> of <strong>Maltese</strong>.<br />
Counts <strong>for</strong> each part of speech are calculated from (MLRS, 2013).<br />
Part of speech Count Word <strong>for</strong>ms Subtotal<br />
Noun 198,237 (56.1%) 40 7,929,480<br />
Verb 81,063 (22.9%) 966 78,306,858<br />
Adjective 49,853 (14.1%) 9 448,677<br />
Other 24,361 (6.9%) 1 24,361<br />
Total 353,514 (100%) 86,709,376<br />
3.2 Application<br />
This section details the design <strong>and</strong> implementation of a database <strong>for</strong> collecting together the<br />
available lexical resources <strong>for</strong> <strong>Maltese</strong>, along with the web application <strong>for</strong> interfacing with this<br />
database, named Ġabra (literally, ‘collection’). Details on accessing this application can be found<br />
in appendix D.<br />
3.2.1 Database<br />
Design<br />
As discussed previously, a central principle in the database design <strong>for</strong> this collection is that it<br />
maximises flexibility <strong>and</strong> can accommodate heterogeneous data. To this end, a very general<br />
top-level structure is needed which should minimally accommodate the following:<br />
1. A separation between lemma <strong>and</strong> word<strong>for</strong>m levels with a one-to-many relation from the<br />
<strong>for</strong>mer to the latter.<br />
2. Arbitrary feature names <strong>and</strong> values <strong>for</strong> each word<strong>for</strong>m <strong>and</strong> lexeme.<br />
3. Having a distinct entity <strong>for</strong> consonantal roots, which are shared by multiple entries of<br />
different lexical categories.<br />
4. Links between lexemes which can be given arbitrary relation names <strong>and</strong> properties.<br />
5. Attribution of any kind of entry to a source with associated licensing in<strong>for</strong>mation.<br />
The top-level structure adopted <strong>for</strong> this database is show in figure 3.1. Lexemes are the<br />
entries as we think of them in a dictionary. Each is represented minimally by a headword <strong>and</strong><br />
POS tag, but can often include a gloss in English <strong>and</strong> other features common to the entire entry.<br />
Word<strong>for</strong>ms represent the inflected <strong>for</strong>ms within a lexeme, <strong>and</strong> include the lemma as word<strong>for</strong>m.<br />
They only need to consist of a single string, but ideally will list the features that they inflect <strong>for</strong>.<br />
The roots are modelled as separate entities as they are shared by multiple lexemes. It could<br />
50