A Treebank-based Investigation of IPP-triggering Verbs in Dutch
A Treebank-based Investigation of IPP-triggering Verbs in Dutch
A Treebank-based Investigation of IPP-triggering Verbs in Dutch
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
the sequential order <strong>of</strong> nodes <strong>in</strong> a sentence.<br />
Projectivity : A dependency tree D = (V, E, ≼) is projective if it satisfies the follow<strong>in</strong>g<br />
condition: i → j, υ ∈ (i, j) =⇒ υ ∈ Subtree i . Otherwise D is non-projective.<br />
3.2 Relaxations <strong>of</strong> Projectivity<br />
As [12] remarks, natural languages approve grammatical constructs that violate<br />
the condition <strong>of</strong> projectivity. In the follow<strong>in</strong>g, we def<strong>in</strong>e the global and edge <strong>based</strong><br />
constra<strong>in</strong>ts that have been proposed to relax projectivity.<br />
Planarity : A dependency tree D is non-planar if there are two edges i 1 ↔ j 1 , i 2 ↔<br />
j 2 <strong>in</strong> D such that i 1 ∈ (i 2 , j 2 ) Λ i 2 ∈ (i 1 , j 1 ). Otherwise D is planar. Planarity is a<br />
relaxation <strong>of</strong> projectivity and a strictly weaker constra<strong>in</strong>t than it. Planarity can be<br />
visualized as ‘cross<strong>in</strong>g arcs’ <strong>in</strong> the horizontal representation <strong>of</strong> a dependency tree.<br />
Well-nestedness : A dependency tree is ill-nested if two non-projective subtrees (disjo<strong>in</strong>t)<br />
<strong>in</strong>terleave. Two disjo<strong>in</strong>t subtrees l 1 , r 1 and l 2 , r 2 <strong>in</strong>terleave if l 1 1. The gap degree <strong>of</strong> a node gd(x n ) is the number <strong>of</strong> such gaps <strong>in</strong><br />
its projection. The gap degree <strong>of</strong> a sentence is the maximum among the gap degree<br />
<strong>of</strong> its nodes [8]. Gap degree corresponds to the maximal number <strong>of</strong> times the yield<br />
<strong>of</strong> a node is <strong>in</strong>terrupted. A node with gap degree > 0 is non-projective.<br />
Edge Degree : For any edge <strong>in</strong> a dependency tree we def<strong>in</strong>e edge degree as the<br />
number <strong>of</strong> connected components <strong>in</strong> the span <strong>of</strong> the edge which are not dom<strong>in</strong>ated<br />
by the parent node <strong>of</strong> the edge. ed i↔j is the number <strong>of</strong> components <strong>in</strong> the span(i, j)<br />
and which do not belong to π parenti↔j .<br />
4 Empirical Results<br />
In this section, we present an experimental evaluation <strong>of</strong> the dependency tree constra<strong>in</strong>ts<br />
mentioned <strong>in</strong> the previous section on the dependency structures across<br />
IL treebanks. Among the treebanks, H<strong>in</strong>di treebank due to its relatively large<br />
size provides good <strong>in</strong>sights <strong>in</strong>to the possible construction types that approve nonprojectivity<br />
<strong>in</strong> ILs. Urdu and Bangla treebanks, though comparatively smaller <strong>in</strong><br />
size, show similar construction types approv<strong>in</strong>g non-projectivity. Telugu, on the<br />
other hand, as reflected by the analysis <strong>of</strong> the Telugu treebanks, does not have any<br />
non-projective structures. Possible types <strong>of</strong> potential non-projective constructions<br />
and the phenomena <strong>in</strong>duc<strong>in</strong>g non-projectivity are listed <strong>in</strong> Table 3. In Table 2,<br />
we report the percentage <strong>of</strong> structures that satisfy various graph properties across<br />
IL treebanks. In the treebanks, Urdu has 23%, H<strong>in</strong>di has 15% and Bangla has<br />
5% non-projective structures. In H<strong>in</strong>di and Urdu treebanks, highest gap degree<br />
and edge degree for non-projective structures is 3 and 4 respectively which tallies<br />
with the previous results on H<strong>in</strong>di treebank [10]. As shown <strong>in</strong> Table 2, planarity<br />
accounts for more data than projectivity, while almost all the structures are wellnested,<br />
H<strong>in</strong>di has 99.7%, Urdu has 98.3% and Bangla has 99.8% <strong>of</strong> structures<br />
as well-nested. Despite the high coverage <strong>of</strong> well-nestedness constra<strong>in</strong>t <strong>in</strong> these<br />
languages, there are l<strong>in</strong>guistic phenomena which give rise to ill-nested structures.<br />
The almost 1% <strong>of</strong> ill-nested structures are not annotation errors but are rather l<strong>in</strong>guistically<br />
justified. Few phenomena that were observed upon close <strong>in</strong>spection<br />
<strong>of</strong> the treebanks are extraposition and topicalization <strong>of</strong> verbal arguments across<br />
clausal conjunctions. Extraposition, as a reason beh<strong>in</strong>d ill-nestedness, is also observed<br />
by [9]. Sentence (1) shows a typical ill-nested dependency analysis <strong>of</strong> a<br />
sentence from H<strong>in</strong>di treebank. In this sentence, vyakti ‘person’ <strong>in</strong> complement<br />
clause is relativized by an extraposed relative clause which conta<strong>in</strong>s a nom<strong>in</strong>al<br />
expression esa koi jawaab ‘any such answer’ relativized by another extraposed<br />
relative clause.<br />
27