06.07.2014 Views

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Experiments on Dependency Pars<strong>in</strong>g <strong>of</strong> Urdu<br />

Riyaz Ahmad Bhat, Sambhav Ja<strong>in</strong> and Dipti Misra Sharma<br />

Language Technology Research Center, IIIT-Hyderabad, India<br />

{riyaz.bhat|sambhav.ja<strong>in</strong>}@research.iiit.ac.<strong>in</strong>, dipti@iiit.ac.<strong>in</strong><br />

Abstract<br />

In this paper, we present our pars<strong>in</strong>g efforts for Urdu, a South Asian language<br />

with rich morphology. In this effort we tried to identify the important aspects<br />

<strong>of</strong> the morphology <strong>in</strong> Urdu which could be exploited for efficient syntactic<br />

pars<strong>in</strong>g. The most important feature is the presence <strong>of</strong> case clitics on nom<strong>in</strong>als<br />

which mark the k<strong>in</strong>d <strong>of</strong> relation they bear to their heads. In Urdu, unlike<br />

a positioned language, arguments <strong>of</strong> the verb are expressed not through their<br />

absolute position <strong>in</strong> the sentence but through the morphology they carry. Experiments<br />

us<strong>in</strong>g the Urdu dependency treebank (UDT) show the significant<br />

impact <strong>of</strong> the case markers on pars<strong>in</strong>g accuracy <strong>of</strong> Urdu.<br />

In this paper we have experimented with dependency pars<strong>in</strong>g <strong>of</strong> Urdu<br />

us<strong>in</strong>g the Urdu Dependency Tree-bank (UDT). In UDT there are 3226 sentences<br />

(approx. 0.1M words) annotated at multiple levels viz morphological,<br />

part-<strong>of</strong>-speech (POS), chunk and dependency levels. Apart from pars<strong>in</strong>g experiments<br />

we also reported some <strong>of</strong> the problem areas and issues concern<strong>in</strong>g<br />

the dependency pars<strong>in</strong>g <strong>of</strong> Urdu.<br />

1 Introduction<br />

Pars<strong>in</strong>g morphologically rich languages (MRLs) like Arabic, Czech, Turkish, etc.,<br />

is a hard and challeng<strong>in</strong>g task [9]. A large <strong>in</strong>ventory <strong>of</strong> word-forms, higher degrees<br />

<strong>of</strong> argument scrambl<strong>in</strong>g, discont<strong>in</strong>uous constituents, long distance dependencies<br />

and case syncretism are some <strong>of</strong> the challenges which any statistical parser has to<br />

met for efficient pars<strong>in</strong>g <strong>of</strong> MRLs. Due to the flexible word order <strong>of</strong> MRLs, dependency<br />

representations are preferred over constituency for their syntactic analysis.<br />

The dependency representations do not constra<strong>in</strong>t the order <strong>of</strong> words <strong>in</strong> a sentence<br />

and are thus better suited for the flexible order<strong>in</strong>g <strong>of</strong> words <strong>in</strong> such languages. Like<br />

any other MRL, Indian languages are rich <strong>in</strong> morphology and allow higher degrees<br />

<strong>of</strong> argument scrambl<strong>in</strong>g. [1] have proposed a dependency <strong>based</strong> annotation scheme<br />

for the syntactic analysis <strong>of</strong> Indian languages. Currently a number <strong>of</strong> dependency<br />

tree-banks are under development follow<strong>in</strong>g the annotation scheme. Urdu treebank<br />

is one among the tree-banks under development which we have used <strong>in</strong> this work<br />

[3].<br />

In recent times the availability <strong>of</strong> large scale syntactic tree-banks has led to a<br />

manifold <strong>in</strong>crease <strong>in</strong> the development <strong>of</strong> data driven parsers. CoNLL shared task<br />

31

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!