A computational grammar and lexicon for Maltese
A computational grammar and lexicon for Maltese
A computational grammar and lexicon for Maltese
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
e argued that this separation is unnecessary, <strong>and</strong> all the root in<strong>for</strong>mation can be stored in the<br />
lexemes (de-normalised).<br />
In addition to the above, all entries optionally have a source field which indicates the citation<br />
reference of the source of the entry. This exists so that the in<strong>for</strong>mation stored in the database<br />
does not become disconnected from its original authors (the sources entity is not shown in the<br />
schema diagram).<br />
roots<br />
radicals<br />
type<br />
variant (Int)<br />
lemma<br />
pos<br />
lexemes<br />
gloss<br />
<strong>for</strong>m (Int)<br />
transitivity<br />
hypothetical (Bool)<br />
frequency (Bool)<br />
...<br />
word<strong>for</strong>ms<br />
surface_<strong>for</strong>m<br />
gender<br />
number<br />
aspect<br />
subject<br />
do<br />
io<br />
polarity<br />
...<br />
Figure 3.1: Basic data schema used in the <strong>lexicon</strong>. Each box shows (in order) the entity name,<br />
the fields that are always present, <strong>and</strong> optional fields that only occur in some documents. All<br />
fields are of type string unless marked. Primary <strong>and</strong> <strong>for</strong>eign key fields are omitted.<br />
Realisation<br />
The database engine chosen <strong>for</strong> implementing this <strong>lexicon</strong> is MongoDB 1 , a document-oriented<br />
system which stores data as JSON-style documents 2 with dynamic schemas. The great advantage<br />
of this system is that documents in the same collection can have different structures, which<br />
is exactly what is needed in this project. This makes it very easy to add new features to any<br />
entry without changing the table schema <strong>and</strong> ending up with lots of null fields.<br />
A side effect of this however is that the database engine does not itself h<strong>and</strong>le joins, as<br />
with a traditional RDBMS. Thus when modelling data using MongoDB, data de-normalisation<br />
is often recommended 3 . This means that rather than splitting data between multiple tables<br />
<strong>and</strong> depending on <strong>for</strong>eign keys <strong>and</strong> joins, the same data may be stored in multiple locations.<br />
This introduces some overhead in maintenance <strong>and</strong> introduces a risk of data inconsistency, but<br />
makes data retrieval <strong>and</strong> searching easier <strong>and</strong> faster.<br />
1 http://www.mongodb.org/, accessed 2013-09-05<br />
2 Note about MongoDB nomenclature: a “collection” can be thought of as a table, a “document” is an entry in a<br />
collection, <strong>and</strong> a “field” is analogous to a table column. The concept of a schema is not en<strong>for</strong>ced at all, <strong>and</strong> the term<br />
is used loosely.<br />
3 See http://docs.mongodb.org/manual/core/data-modeling/, accessed 2013-09-03<br />
51