26.12.2013 Views

A computational grammar and lexicon for Maltese

A computational grammar and lexicon for Maltese

A computational grammar and lexicon for Maltese

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Table 3.1: Estimation of the number of word <strong>for</strong>ms required in a full-<strong>for</strong>m <strong>lexicon</strong> of <strong>Maltese</strong>.<br />

Counts <strong>for</strong> each part of speech are calculated from (MLRS, 2013).<br />

Part of speech Count Word <strong>for</strong>ms Subtotal<br />

Noun 198,237 (56.1%) 40 7,929,480<br />

Verb 81,063 (22.9%) 966 78,306,858<br />

Adjective 49,853 (14.1%) 9 448,677<br />

Other 24,361 (6.9%) 1 24,361<br />

Total 353,514 (100%) 86,709,376<br />

3.2 Application<br />

This section details the design <strong>and</strong> implementation of a database <strong>for</strong> collecting together the<br />

available lexical resources <strong>for</strong> <strong>Maltese</strong>, along with the web application <strong>for</strong> interfacing with this<br />

database, named Ġabra (literally, ‘collection’). Details on accessing this application can be found<br />

in appendix D.<br />

3.2.1 Database<br />

Design<br />

As discussed previously, a central principle in the database design <strong>for</strong> this collection is that it<br />

maximises flexibility <strong>and</strong> can accommodate heterogeneous data. To this end, a very general<br />

top-level structure is needed which should minimally accommodate the following:<br />

1. A separation between lemma <strong>and</strong> word<strong>for</strong>m levels with a one-to-many relation from the<br />

<strong>for</strong>mer to the latter.<br />

2. Arbitrary feature names <strong>and</strong> values <strong>for</strong> each word<strong>for</strong>m <strong>and</strong> lexeme.<br />

3. Having a distinct entity <strong>for</strong> consonantal roots, which are shared by multiple entries of<br />

different lexical categories.<br />

4. Links between lexemes which can be given arbitrary relation names <strong>and</strong> properties.<br />

5. Attribution of any kind of entry to a source with associated licensing in<strong>for</strong>mation.<br />

The top-level structure adopted <strong>for</strong> this database is show in figure 3.1. Lexemes are the<br />

entries as we think of them in a dictionary. Each is represented minimally by a headword <strong>and</strong><br />

POS tag, but can often include a gloss in English <strong>and</strong> other features common to the entire entry.<br />

Word<strong>for</strong>ms represent the inflected <strong>for</strong>ms within a lexeme, <strong>and</strong> include the lemma as word<strong>for</strong>m.<br />

They only need to consist of a single string, but ideally will list the features that they inflect <strong>for</strong>.<br />

The roots are modelled as separate entities as they are shared by multiple lexemes. It could<br />

50

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!