A Treebank-based Investigation of IPP-triggering Verbs in Dutch
A Treebank-based Investigation of IPP-triggering Verbs in Dutch
A Treebank-based Investigation of IPP-triggering Verbs in Dutch
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
obta<strong>in</strong>ed with MALTParser is 76.61% label attachment score, 88.45% unlabeled attachment<br />
score, 80.03% label accuracy score. The best result for both the parsers is<br />
obta<strong>in</strong>ed on the same experimental sett<strong>in</strong>gs <strong>in</strong>corporat<strong>in</strong>g case <strong>in</strong>formation, which<br />
further strengthens our conjecture on the importance <strong>of</strong> case markers. All the experiments<br />
suggest the effectiveness <strong>of</strong> the features successively <strong>in</strong>cluded <strong>in</strong> the<br />
system. By <strong>in</strong>corporat<strong>in</strong>g the POS-tag, chunk tag, lemma and case markers both<br />
the parsers have shown a steep <strong>in</strong>crease <strong>in</strong> their performance. However, GNP <strong>in</strong>formation<br />
had a negative impact on the performance <strong>of</strong> both the parsers. In Urdu<br />
verb ma<strong>in</strong>ly agrees with its nom<strong>in</strong>ative argument which can either be an ‘agent’<br />
or a ‘theme’. In sentences where both the arguments <strong>of</strong> a verb (transitive) are <strong>in</strong><br />
nom<strong>in</strong>ative form, it agrees with ‘agent’. A verb can also agree with a constituent<br />
<strong>in</strong>side its clausal complement, the phenomenon called Long Distance Agreement.<br />
It’s probably because <strong>of</strong> this agreement behavior <strong>of</strong> a verb which affects the pars<strong>in</strong>g<br />
accuracy. Among all the features <strong>in</strong>corporated, case markers have played a<br />
major role <strong>in</strong> improv<strong>in</strong>g the pars<strong>in</strong>g performance. A 10.74% <strong>in</strong>crement <strong>in</strong> LAS <strong>in</strong><br />
MaltParser with case <strong>in</strong>formation makes a clear statement about the importance <strong>of</strong><br />
morphology <strong>based</strong> pars<strong>in</strong>g <strong>of</strong> MRLs.<br />
Dur<strong>in</strong>g the error analysis, the primary confusion is observed due to granularity<br />
<strong>of</strong> the tag-set, for labels ‘r6’ and ‘r6-k2’ (genitive relations) for example a total <strong>of</strong><br />
50 cases have been <strong>in</strong>terchangeably marked <strong>in</strong>correct. This can be expla<strong>in</strong>ed by the<br />
fact that the labels differ slightly <strong>in</strong> their semantics and it is potentially not possible<br />
to disambiguate <strong>based</strong> on the simple features used <strong>in</strong> the experiments. The issue <strong>of</strong><br />
granularity will automatically subside if a coarse gra<strong>in</strong>ed tag-set is used ignor<strong>in</strong>g<br />
f<strong>in</strong>er dist<strong>in</strong>ctions.<br />
5 Conclusion and Future Work<br />
This paper presents our efforts towards the dependency pars<strong>in</strong>g <strong>of</strong> Urdu. It is an<br />
attempt to explore the importance <strong>of</strong> l<strong>in</strong>guistic <strong>in</strong>formation encoded <strong>in</strong> the morphology<br />
<strong>of</strong> the language for data driven pars<strong>in</strong>g. Our ma<strong>in</strong> <strong>in</strong>ference from the<br />
experiments is that some morphological features viz. the case markers play a vital<br />
role <strong>in</strong> pars<strong>in</strong>g while on the other hand morphological <strong>in</strong>formation <strong>of</strong> gender,<br />
number and person (agreement features), has not delivered any improvement. In<br />
the process we have also <strong>in</strong>vestigated the extent <strong>of</strong> impact <strong>of</strong> non-projectivity, and<br />
<strong>in</strong>spected the role <strong>of</strong> POS tags and lemma on pars<strong>in</strong>g accuracy.<br />
We have currently reported our work on the chunk heads, which we wish to<br />
extend to full pars<strong>in</strong>g with chunks expanded along with their <strong>in</strong>tra-chunk dependencies.<br />
Also the current dependency tag-set is highly f<strong>in</strong>e gra<strong>in</strong>ed, consist<strong>in</strong>g <strong>of</strong><br />
53 tags, and <strong>of</strong>ten for practical applications we do not need such deep analysis.<br />
So, our efforts would be to come up with an efficient parser with coarse gra<strong>in</strong>ed<br />
dependency labels.<br />
35