A Treebank-based Investigation of IPP-triggering Verbs in Dutch
A Treebank-based Investigation of IPP-triggering Verbs in Dutch
A Treebank-based Investigation of IPP-triggering Verbs in Dutch
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Non-Projective Structures <strong>in</strong> Indian Language<br />
<strong>Treebank</strong>s<br />
Riyaz Ahmad Bhat and Dipti Misra Sharma<br />
Language Technology Research Center, IIIT-Hyderabad, India<br />
E-mail: riyaz.bhat@research.iiit.ac.<strong>in</strong>,dipti@iiit.ac.<strong>in</strong><br />
Abstract<br />
In recent years non-projective structures have been widely studied across different<br />
languages. These dependency structures have been reported to restrict<br />
the pars<strong>in</strong>g efficiency and pose problems for grammatical formalisms. Nonprojective<br />
structures are particularly frequent <strong>in</strong> morphologically rich languages<br />
like Czech and H<strong>in</strong>di [8], [10]. In H<strong>in</strong>di a major chunk <strong>of</strong> parse<br />
errors are due to non-projective structures [6], which motivates a thorough<br />
analysis <strong>of</strong> these structures, both at l<strong>in</strong>guistic and formal levels, <strong>in</strong> H<strong>in</strong>di<br />
and other related languages. In this work we study non-projectivity <strong>in</strong> Indian<br />
languages (ILs) which are morphologically richer with relatively free<br />
word order. We present a formal characterization and l<strong>in</strong>guistic categorization<br />
<strong>of</strong> non-projective dependency structures across four Indian Language<br />
<strong>Treebank</strong>s.<br />
1 Introduction<br />
Non-projective structures <strong>in</strong> contrast to projective dependency structures conta<strong>in</strong><br />
a node with a discont<strong>in</strong>uous yield. These structures are common <strong>in</strong> natural languages,<br />
particularly frequent <strong>in</strong> morphologically rich languages with flexible word<br />
order like Czech, German etc. In the recent past the formal characterization <strong>of</strong> nonprojective<br />
structures have been thoroughly studied, motivated by the challenges<br />
these structures pose to the dependency pars<strong>in</strong>g [7], [11], [5]. Other studies have<br />
tried to provide an adequate l<strong>in</strong>guistic description <strong>of</strong> non-projectivity <strong>in</strong> <strong>in</strong>dividual<br />
languages [4], [10]. Mannem et.al [10] have done a prelim<strong>in</strong>ary study on Hyderabad<br />
Dependency <strong>Treebank</strong> (HyDT) a pilot dependency treebank <strong>of</strong> H<strong>in</strong>di conta<strong>in</strong><strong>in</strong>g<br />
1865 sentences annotated with dependency structures. They have identified different<br />
construction types present <strong>in</strong> the treebank with non-projectivity. In this work<br />
we present our analysis <strong>of</strong> non-projectivity across four IL treebanks. ILs are morphologically<br />
richer, grammatical relations are expressed via morphology <strong>of</strong> words<br />
rather than the syntax. This allows words <strong>in</strong> these language to move around <strong>in</strong> the<br />
sentence structure. Such movements quite <strong>of</strong>ten, as we will see <strong>in</strong> subsequent sections,<br />
lead to non-projectivity <strong>in</strong> the dependency structure. We studied treebanks <strong>of</strong><br />
four Indian Languages viz H<strong>in</strong>di (Indo-Aryan), Urdu (Indo-Aryan), Bangla (Indo-<br />
Aryan) and Telugu (Dravidian). They all have an unmarked Subject-Object-Verb<br />
(SOV) word order, however the order can be altered under appropriate pragmatic<br />
conditions. Movement <strong>of</strong> arguments and modifiers away from the head is the major<br />
phenomenon observed that <strong>in</strong>duces non-projectivity <strong>in</strong> these languages.<br />
25