26.12.2013 Views

A computational grammar and lexicon for Maltese

A computational grammar and lexicon for Maltese

A computational grammar and lexicon for Maltese

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 3<br />

Computational <strong>lexicon</strong><br />

This chapter begins by presenting a web application designed <strong>for</strong> collecting the heterogeneous<br />

lexical resources available <strong>for</strong> <strong>Maltese</strong> into a single database. After explaining the setup<br />

<strong>and</strong> implementation of this collection, we then go on to describe how it is combined with the<br />

resource <strong>grammar</strong> from the previous chapter to produce full-<strong>for</strong>m <strong>computational</strong> <strong>lexicon</strong>.<br />

3.1 Method<br />

3.1.1 Sources<br />

The approach adopted in this work <strong>for</strong> constructing a <strong>computational</strong> <strong>lexicon</strong> <strong>for</strong> <strong>Maltese</strong> is<br />

to first build a plat<strong>for</strong>m where all existing lexical resources can be gathered into a single collection.<br />

While there are some large, high quality print dictionaries available <strong>for</strong> <strong>Maltese</strong> (see<br />

section 1.2.1), the number <strong>and</strong> size of <strong>computational</strong> resources is only a fraction of this. Nevertheless,<br />

the hope is that an open plat<strong>for</strong>m <strong>for</strong> hosting <strong>and</strong> searching through resources from<br />

heterogeneous sources will be useful in its own right, <strong>and</strong> even attract the addition of new<br />

lexical resources that may become available in the future. The sources available at the time of<br />

writing were:<br />

• An exhaustive list of all 4,142 root-<strong>and</strong>-pattern verbs (including hypothetical <strong>for</strong>ms), from<br />

the verbal roots database (Camilleri & Spagnol, 2013).<br />

• A corpus of 654 broken plurals <strong>for</strong> both nouns <strong>and</strong> adjectives (Mayer et al. , 2013).<br />

• A list of over 2,500 verbal nouns listed in the Aquilina dictionary <strong>and</strong> other sources (Ellul,<br />

2013).<br />

• A Basic English-<strong>Maltese</strong> dictionary containing some 5,454 English entries (Falzon, 2012).<br />

3.1.2 Heterogeneous data<br />

Traditional relational databases work with a strict schema system, whereby the structure of all<br />

data is fixed at design time <strong>and</strong> all entries in the database necessarily con<strong>for</strong>m to this schema.<br />

In this work however we are dealing with lexical resources from distinctly different sources,<br />

47

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!