06.07.2014 Views

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

obta<strong>in</strong>ed with MALTParser is 76.61% label attachment score, 88.45% unlabeled attachment<br />

score, 80.03% label accuracy score. The best result for both the parsers is<br />

obta<strong>in</strong>ed on the same experimental sett<strong>in</strong>gs <strong>in</strong>corporat<strong>in</strong>g case <strong>in</strong>formation, which<br />

further strengthens our conjecture on the importance <strong>of</strong> case markers. All the experiments<br />

suggest the effectiveness <strong>of</strong> the features successively <strong>in</strong>cluded <strong>in</strong> the<br />

system. By <strong>in</strong>corporat<strong>in</strong>g the POS-tag, chunk tag, lemma and case markers both<br />

the parsers have shown a steep <strong>in</strong>crease <strong>in</strong> their performance. However, GNP <strong>in</strong>formation<br />

had a negative impact on the performance <strong>of</strong> both the parsers. In Urdu<br />

verb ma<strong>in</strong>ly agrees with its nom<strong>in</strong>ative argument which can either be an ‘agent’<br />

or a ‘theme’. In sentences where both the arguments <strong>of</strong> a verb (transitive) are <strong>in</strong><br />

nom<strong>in</strong>ative form, it agrees with ‘agent’. A verb can also agree with a constituent<br />

<strong>in</strong>side its clausal complement, the phenomenon called Long Distance Agreement.<br />

It’s probably because <strong>of</strong> this agreement behavior <strong>of</strong> a verb which affects the pars<strong>in</strong>g<br />

accuracy. Among all the features <strong>in</strong>corporated, case markers have played a<br />

major role <strong>in</strong> improv<strong>in</strong>g the pars<strong>in</strong>g performance. A 10.74% <strong>in</strong>crement <strong>in</strong> LAS <strong>in</strong><br />

MaltParser with case <strong>in</strong>formation makes a clear statement about the importance <strong>of</strong><br />

morphology <strong>based</strong> pars<strong>in</strong>g <strong>of</strong> MRLs.<br />

Dur<strong>in</strong>g the error analysis, the primary confusion is observed due to granularity<br />

<strong>of</strong> the tag-set, for labels ‘r6’ and ‘r6-k2’ (genitive relations) for example a total <strong>of</strong><br />

50 cases have been <strong>in</strong>terchangeably marked <strong>in</strong>correct. This can be expla<strong>in</strong>ed by the<br />

fact that the labels differ slightly <strong>in</strong> their semantics and it is potentially not possible<br />

to disambiguate <strong>based</strong> on the simple features used <strong>in</strong> the experiments. The issue <strong>of</strong><br />

granularity will automatically subside if a coarse gra<strong>in</strong>ed tag-set is used ignor<strong>in</strong>g<br />

f<strong>in</strong>er dist<strong>in</strong>ctions.<br />

5 Conclusion and Future Work<br />

This paper presents our efforts towards the dependency pars<strong>in</strong>g <strong>of</strong> Urdu. It is an<br />

attempt to explore the importance <strong>of</strong> l<strong>in</strong>guistic <strong>in</strong>formation encoded <strong>in</strong> the morphology<br />

<strong>of</strong> the language for data driven pars<strong>in</strong>g. Our ma<strong>in</strong> <strong>in</strong>ference from the<br />

experiments is that some morphological features viz. the case markers play a vital<br />

role <strong>in</strong> pars<strong>in</strong>g while on the other hand morphological <strong>in</strong>formation <strong>of</strong> gender,<br />

number and person (agreement features), has not delivered any improvement. In<br />

the process we have also <strong>in</strong>vestigated the extent <strong>of</strong> impact <strong>of</strong> non-projectivity, and<br />

<strong>in</strong>spected the role <strong>of</strong> POS tags and lemma on pars<strong>in</strong>g accuracy.<br />

We have currently reported our work on the chunk heads, which we wish to<br />

extend to full pars<strong>in</strong>g with chunks expanded along with their <strong>in</strong>tra-chunk dependencies.<br />

Also the current dependency tag-set is highly f<strong>in</strong>e gra<strong>in</strong>ed, consist<strong>in</strong>g <strong>of</strong><br />

53 tags, and <strong>of</strong>ten for practical applications we do not need such deep analysis.<br />

So, our efforts would be to come up with an efficient parser with coarse gra<strong>in</strong>ed<br />

dependency labels.<br />

35

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!