12.09.2013 Views

Programme booklet (pdf)

Programme booklet (pdf)

Programme booklet (pdf)

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

66<br />

CLIN 21 – CONFERENCE PROGRAMME<br />

Technology recycling for closely related languages: Dutch<br />

and Afrikaans<br />

Abstract<br />

Pilon, Suléne 1 and Van Huyssteen, Gerhard 2<br />

1 North-West University (VTC)<br />

2 North-West University (PC)<br />

If two languages (L1 and L2) are similar enough, the development of technologies for<br />

L2 can be expedited by recycling existing L1 resources. This process is called technology<br />

recycling and the success thereof is greatly dependent on the degree of similarity<br />

between the two languages in question. Other strategies can, however, be employed<br />

to improve the efficiency of L1 technologies on L2 data and in this research we<br />

experiment with one such strategy, viz. lexical conversion as pre-processing step. We<br />

explore the possibility of using rule-based lexical conversion to improve the accuracy of<br />

Dutch technologies when annotating Afrikaans data. The rationale here is that Dutch<br />

technologies should perform better on Afrikaans data that appears more Dutch-like,<br />

even if the conversion does not yield a good Dutch translation. To do the lexical<br />

conversion, we developed an Afrikaans to Dutch convertor (A2DC) which obtains an<br />

accuracy of more than 72% when converting Afrikaans words to Dutch. For our<br />

experiment we use a state of the art Dutch POS tagger and parser to annotate raw<br />

Afrikaans data. The same data is then converted with A2DC and once again annotated<br />

with the Dutch technologies. In both experiments the conversion has a notably positive<br />

effect on the performance of the Dutch technologies. The biggest difference is<br />

observed in the POS tagging task with the overall accuracy increasing from 62.6% when<br />

annotating raw Afrikaans data to 80.6% when annotating converted data, while the<br />

parsing f-score improves from 0.44 (raw data) to more than 0.68 (converted data).<br />

Corresponding author: sulene.pilon@nwu.ac.za

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!