Programme booklet (pdf)

More documents

Recommendations

Info

66 CLIN 21 – CONFERENCE PROGRAMME Technology recycling for closely related languages: Dutch and Afrikaans Abstract Pilon, Suléne 1 and Van Huyssteen, Gerhard 2 1 North-West University (VTC) 2 North-West University (PC) If two languages (L1 and L2) are similar enough, the development of technologies for L2 can be expedited by recycling existing L1 resources. This process is called technology recycling and the success thereof is greatly dependent on the degree of similarity between the two languages in question. Other strategies can, however, be employed to improve the efficiency of L1 technologies on L2 data and in this research we experiment with one such strategy, viz. lexical conversion as pre-processing step. We explore the possibility of using rule-based lexical conversion to improve the accuracy of Dutch technologies when annotating Afrikaans data. The rationale here is that Dutch technologies should perform better on Afrikaans data that appears more Dutch-like, even if the conversion does not yield a good Dutch translation. To do the lexical conversion, we developed an Afrikaans to Dutch convertor (A2DC) which obtains an accuracy of more than 72% when converting Afrikaans words to Dutch. For our experiment we use a state of the art Dutch POS tagger and parser to annotate raw Afrikaans data. The same data is then converted with A2DC and once again annotated with the Dutch technologies. In both experiments the conversion has a notably positive effect on the performance of the Dutch technologies. The biggest difference is observed in the POS tagging task with the overall accuracy increasing from 62.6% when annotating raw Afrikaans data to 80.6% when annotating converted data, while the parsing f-score improves from 0.44 (raw data) to more than 0.68 (converted data). Corresponding author: sulene.pilon@nwu.ac.za
PRESENTATION ABSTRACTS The more the merrier? How data set size and noisiness affect the accuracy of predicting the dative alternation Abstract Theijssen, Daphne and van Halteren, Hans and Boves, Lou and Oostdijk, Nelleke Radboud University Nijmegen In the dative alternation in English, speakers and writers choose between the prepositional dative construction ('I gave the ball to him' and the double object construction ('I gave him the ball'). Logistic regression models have already been shown to be able to predict over 90% of the choices correctly (e.g. Bresnan et al. 2007). Collecting dative instances from a corpus and encoding them with the required information is a costly procedure. We therefore developed a semi-automatic approach to do this, consisting of three steps: (1) automatically extracting dative candidates, (2) manually approving or rejecting these candidates, and (3) automatically annotating the approved candidates with the required information. The resulting data sets are noisier than data sets that have been checked completely manually, but the approach can yield much larger data sets. We compare the effect of data set size and noisiness on the accuracy of predicting the dative alternation. We employ a 'manual' set of 2,877 instances in spoken English, taken from Switchboard (Godfrey et al. 1992) by Bresnan et al (2007) and from ICE-GB (Greenbaum 1996) by Theijssen (2010). In addition, we use a 'semi-automatic' set with 7,755 instances from Switchboard, ICE-GB and BNC (BNC Consortium 2007). We compare the learning curves of various machine learning algorithms by randomly selecting subsets of the data and extending them with 500 instances each time. We do this for different levels of noisiness, i.e. varying the proportion of 'semi-automatic' instances (0%, 25%, 50%, 75%, 100%). The results are presented at the conference. References BNC Consortium (2007). The British National Corpus, version 3 (BNC XML Edition). Oxford University Computing Services. Bresnan Joan, Anna Cueni, Tatiana Nikitina and R. Harald Baayen (2007). Predicting the Dative Alternation. In Bouma, Gerlof, Irene Kraemer and Joost Zwarts (eds.), Cognitive 67
Page 3 and 4:
Ghent, February 11 th 2011 21 st me
Page 5:
Welcome! For the first time in its
Page 8 and 9:
6 CLIN 21 - CONFERENCE PROGRAMME Co
Page 10 and 11:
8 CLIN 21 - CONFERENCE PROGRAMME CL
Page 12 and 13:
CLIN 21 - CONFERENCE PROGRAMME 09:0
Page 14 and 15:
11:30 - 12:30 12:30 - 12:50 12 CLIN
Page 16 and 17:
14 CLIN 21 - CONFERENCE PROGRAMME R
Page 19: Rethinking anaphora Abstract Massim
Page 22 and 23: 20 CLIN 21 - CONFERENCE PROGRAMME A
Page 34 and 35: Abstract 32 CLIN 21 - CONFERENCE PR
Page 36 and 37: Abstract 34 Nauze, Fabrice Q-go Clu
Page 38 and 39: 36 CLIN 21 - CONFERENCE PROGRAMME C
Page 44 and 45: 42 CLIN 21 - CONFERENCE PROGRAMME D
Page 46 and 47: 44 CLIN 21 - CONFERENCE PROGRAMME E
Page 50 and 51: 48 CLIN 21 - CONFERENCE PROGRAMME L
Page 52 and 53: 50 CLIN 21 - CONFERENCE PROGRAMME M
Page 56 and 57: 54 CLIN 21 - CONFERENCE PROGRAMME P
Page 64 and 65: 62 CLIN 21 - CONFERENCE PROGRAMME S
Page 70 and 71: 68 CLIN 21 - CONFERENCE PROGRAMME F
Page 76 and 77: 74 CLIN 21 - CONFERENCE PROGRAMME U
Page 78 and 79: 76 CLIN 21 - CONFERENCE PROGRAMME W
Page 84 and 85: Abstract 82 Authorship Verification
Page 86 and 87: 84 CLIN 21 - CONFERENCE PROGRAMME D
Page 90 and 91: 88 CLIN 21 - CONFERENCE PROGRAMME O
Page 96 and 97: 94 CLIN 21 - CONFERENCE PROGRAMME T
Page 99: List of Participants 97
Page 102 and 103: 100 CLIN 21 - CONFERENCE PROGRAMME
Page 104 and 105: 102 CLIN 21 - CONFERENCE PROGRAMME
show all

Programme booklet (pdf)

Create successful ePaper yourself

Delete template?

Save as template?