26.12.2013 Views

A computational grammar and lexicon for Maltese

A computational grammar and lexicon for Maltese

A computational grammar and lexicon for Maltese

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

In many cases it is helpful to choose one more reference languages, that is languages which<br />

have already been implemented in the RGL <strong>and</strong> which are familiar to the <strong>grammar</strong>ian. Checking<br />

the linearisation of certain constructions in such a reference be<strong>for</strong>e implementing them in<br />

the current language is often useful. However, this comes with the proviso that resource <strong>grammar</strong>s<br />

are not intended <strong>for</strong> general translation ‘out of the box’. Rather, they should provide the<br />

means <strong>for</strong> an application <strong>grammar</strong>ian to express what they want in a grammatically correct<br />

way. For this reason it can often be hard to judge whether the translation produce by a resource<br />

<strong>grammar</strong> is correct or not, as it would depend on the use of that <strong>grammar</strong> within the<br />

context of an application <strong>grammar</strong>.<br />

2.4.3 Treebanks<br />

A number of treebank files were written during development to test different specific aspects<br />

of the <strong>grammar</strong>. These treebanks are listed in table 2.20. In order to speed up the test-develop<br />

cycle, a simple script was written <strong>for</strong> linearising sets of test trees against the current <strong>grammar</strong><br />

<strong>and</strong> indicating those those which are linearised incorrectly. The script is written in Haskell <strong>and</strong><br />

uses the PGF Haskell library 7 which is provided as part of the GF distribution. The script, along<br />

with the treebank files themselves, can be found under the test/regression/ directory of this<br />

work’s source code repository (see appendix D).<br />

The treebank files are written in a specific <strong>for</strong>mat which uses Org-mode syntax <strong>for</strong> plain<br />

text tables 8 . An example of this <strong>for</strong>mat is given below:<br />

| AST | English | <strong>Maltese</strong> |<br />

|------------------------------------------------------+-------------------+-----------------|<br />

| DetCN (DetQuant IndefArt NumSg) (UseN airplane_N) | an airplane | ajruplan |<br />

| DetCN (DetQuant DefArt NumSg) (UseN airplane_N) | the airplane | l-ajruplan |<br />

Verb inflection treebank<br />

In addition to the treebanks listed in table 2.20, it would also be relevant to mention a small<br />

treebank of <strong>Maltese</strong> verb inflections also compiled during this work. This treebank 9 consists<br />

of full inflection tables of 74 <strong>Maltese</strong> verbs of different paradigms, which have been tabulated<br />

in CSV <strong>for</strong>mat. This amounts to a total of 70,448 unique word<strong>for</strong>ms which have been manually<br />

checked. Un<strong>for</strong>tunately these could not be converted to the required <strong>for</strong>mat in the time<br />

available <strong>and</strong> are thus excluded from the results in the next section.<br />

7 http://hackage.haskell.org/packages/archive/gf/3.1.6.2/doc/html/PGF.html, accessed 2013-09-05<br />

8 http://orgmode.org/manual/Tables.html, accessed 2013-07-23<br />

9 Refer to appendix D <strong>for</strong> more details.<br />

41

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!