A Treebank-based Investigation of IPP-triggering Verbs in Dutch
A Treebank-based Investigation of IPP-triggering Verbs in Dutch
A Treebank-based Investigation of IPP-triggering Verbs in Dutch
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
In this paper, we discuss the constra<strong>in</strong>ts and measures evaluated by [8], [12].<br />
We evaluate these measures on IL treebanks, follow<strong>in</strong>g with the adequate l<strong>in</strong>guistic<br />
description <strong>of</strong> non-projective structures, focus<strong>in</strong>g on the identification and categorization<br />
<strong>of</strong> grammatical structures that can readily undergo non-projectivity and the<br />
possible reasons for the same.<br />
The paper is organized as follows: In Section 2, we give an overview <strong>of</strong> Indian<br />
Language <strong>Treebank</strong><strong>in</strong>g with reference to the treebanks used <strong>in</strong> this work. Section<br />
3 discusses different constra<strong>in</strong>ts on dependency trees followed by the empirical<br />
results <strong>of</strong> our experiments <strong>in</strong> Section 4. In Section 5, we present an <strong>in</strong> depth analysis<br />
<strong>of</strong> non-projective structures approved by Indian Languages. F<strong>in</strong>ally Section 6<br />
concludes the paper.<br />
2 <strong>Treebank</strong>s<br />
In our analysis <strong>of</strong> non-projective structures <strong>in</strong> Indian languages, we use treebanks<br />
<strong>of</strong> four languages namely H<strong>in</strong>di, Urdu, Bangla and Telugu. These treebanks are<br />
currently be<strong>in</strong>g developed follow<strong>in</strong>g the annotation scheme <strong>based</strong> on the Computational<br />
Pan<strong>in</strong>ian Grammar (CPG) [1]. The dependency relations <strong>in</strong> these treebanks,<br />
under this framework, are marked between chunks. A chunk is a m<strong>in</strong>imal,<br />
non-recursive structure consist<strong>in</strong>g <strong>of</strong> a group <strong>of</strong> closely related words. Thus, <strong>in</strong><br />
these treebanks a node <strong>in</strong> a dependency tree is represented by a chunk and not by a<br />
word. Table 1 gives an overview <strong>of</strong> the four above mentioned treebanks. While the<br />
H<strong>in</strong>di treebank<strong>in</strong>g effort has matured and grown considerably [2] the other three<br />
treebanks are still at an <strong>in</strong>itial stage <strong>of</strong> development. Because <strong>of</strong> the large size and<br />
stable annotations, H<strong>in</strong>di treebank provides major <strong>in</strong>sights <strong>in</strong>to potential sites <strong>of</strong><br />
non-projectivity. In our work we have ignored <strong>in</strong>tra-chunk dependencies for two<br />
reasons 1) currently <strong>in</strong>tra-chunk dependencies are not be<strong>in</strong>g marked <strong>in</strong> the treebanks,<br />
and 2) <strong>in</strong>tra-chunk dependencies are projective; all the non-projective edges are<br />
distributed among the <strong>in</strong>ter-chunk relations (as is the case with H<strong>in</strong>di [10]).<br />
Language Sentences Words / Sentences Chunks /Sentences<br />
H<strong>in</strong>di 20705 20.8 10.7<br />
Urdu 3226 29.1 13.7<br />
Bangla 1279 9.5 6.4<br />
Telugu 1635 9.4 3.9<br />
Table 1: IL TREEBANK STATISTICS<br />
In comparison to other free word order languages like Czech and Danish which<br />
have non-projectivity <strong>in</strong> 23% (out <strong>of</strong> 73088 sentences) and 15% (out <strong>of</strong> 4393 sentences)<br />
respectively [8], [5], Indian languages show <strong>in</strong>terest<strong>in</strong>g figures, Urdu has<br />
highest number <strong>of</strong> non-projective sentences, out <strong>of</strong> 3226 sentences 23% are nonprojective,<br />
<strong>in</strong> H<strong>in</strong>di the number drops to 15% out <strong>of</strong> 20705 sentences, <strong>in</strong> Bangla<br />
5% <strong>of</strong> 1279 sentences and <strong>in</strong>terest<strong>in</strong>gly there are no non-projective dependency<br />
structures <strong>in</strong> Telugu treebank.<br />
3 Dependency Graph and its properties<br />
In this section, we give a formal def<strong>in</strong>ition <strong>of</strong> dependency tree, and subsequently<br />
def<strong>in</strong>e different constra<strong>in</strong>ts on these dependency trees like projectivity, planarity<br />
and well-nestedness.<br />
Dependency Tree : A dependency tree D = (V, E, ≼) is a directed graph with V a set<br />
<strong>of</strong> nodes, E a set <strong>of</strong> edges show<strong>in</strong>g a dependency relation on V, and ≼ l<strong>in</strong>ear order on V.<br />
Every dependency tree satisfies two properties : a) it is acyclic, and b) all nodes<br />
have <strong>in</strong>-degree 1, except root node with <strong>in</strong>-degree 0.<br />
3.1 Condition <strong>of</strong> Projectivity<br />
Condition <strong>of</strong> projectivity <strong>in</strong> contrast to acyclicity and <strong>in</strong>-degree concerns the <strong>in</strong>teraction<br />
between the dependency relations and the projection <strong>of</strong> a these relations on<br />
26