06.07.2014 Views

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

the sequential order <strong>of</strong> nodes <strong>in</strong> a sentence.<br />

Projectivity : A dependency tree D = (V, E, ≼) is projective if it satisfies the follow<strong>in</strong>g<br />

condition: i → j, υ ∈ (i, j) =⇒ υ ∈ Subtree i . Otherwise D is non-projective.<br />

3.2 Relaxations <strong>of</strong> Projectivity<br />

As [12] remarks, natural languages approve grammatical constructs that violate<br />

the condition <strong>of</strong> projectivity. In the follow<strong>in</strong>g, we def<strong>in</strong>e the global and edge <strong>based</strong><br />

constra<strong>in</strong>ts that have been proposed to relax projectivity.<br />

Planarity : A dependency tree D is non-planar if there are two edges i 1 ↔ j 1 , i 2 ↔<br />

j 2 <strong>in</strong> D such that i 1 ∈ (i 2 , j 2 ) Λ i 2 ∈ (i 1 , j 1 ). Otherwise D is planar. Planarity is a<br />

relaxation <strong>of</strong> projectivity and a strictly weaker constra<strong>in</strong>t than it. Planarity can be<br />

visualized as ‘cross<strong>in</strong>g arcs’ <strong>in</strong> the horizontal representation <strong>of</strong> a dependency tree.<br />

Well-nestedness : A dependency tree is ill-nested if two non-projective subtrees (disjo<strong>in</strong>t)<br />

<strong>in</strong>terleave. Two disjo<strong>in</strong>t subtrees l 1 , r 1 and l 2 , r 2 <strong>in</strong>terleave if l 1 1. The gap degree <strong>of</strong> a node gd(x n ) is the number <strong>of</strong> such gaps <strong>in</strong><br />

its projection. The gap degree <strong>of</strong> a sentence is the maximum among the gap degree<br />

<strong>of</strong> its nodes [8]. Gap degree corresponds to the maximal number <strong>of</strong> times the yield<br />

<strong>of</strong> a node is <strong>in</strong>terrupted. A node with gap degree > 0 is non-projective.<br />

Edge Degree : For any edge <strong>in</strong> a dependency tree we def<strong>in</strong>e edge degree as the<br />

number <strong>of</strong> connected components <strong>in</strong> the span <strong>of</strong> the edge which are not dom<strong>in</strong>ated<br />

by the parent node <strong>of</strong> the edge. ed i↔j is the number <strong>of</strong> components <strong>in</strong> the span(i, j)<br />

and which do not belong to π parenti↔j .<br />

4 Empirical Results<br />

In this section, we present an experimental evaluation <strong>of</strong> the dependency tree constra<strong>in</strong>ts<br />

mentioned <strong>in</strong> the previous section on the dependency structures across<br />

IL treebanks. Among the treebanks, H<strong>in</strong>di treebank due to its relatively large<br />

size provides good <strong>in</strong>sights <strong>in</strong>to the possible construction types that approve nonprojectivity<br />

<strong>in</strong> ILs. Urdu and Bangla treebanks, though comparatively smaller <strong>in</strong><br />

size, show similar construction types approv<strong>in</strong>g non-projectivity. Telugu, on the<br />

other hand, as reflected by the analysis <strong>of</strong> the Telugu treebanks, does not have any<br />

non-projective structures. Possible types <strong>of</strong> potential non-projective constructions<br />

and the phenomena <strong>in</strong>duc<strong>in</strong>g non-projectivity are listed <strong>in</strong> Table 3. In Table 2,<br />

we report the percentage <strong>of</strong> structures that satisfy various graph properties across<br />

IL treebanks. In the treebanks, Urdu has 23%, H<strong>in</strong>di has 15% and Bangla has<br />

5% non-projective structures. In H<strong>in</strong>di and Urdu treebanks, highest gap degree<br />

and edge degree for non-projective structures is 3 and 4 respectively which tallies<br />

with the previous results on H<strong>in</strong>di treebank [10]. As shown <strong>in</strong> Table 2, planarity<br />

accounts for more data than projectivity, while almost all the structures are wellnested,<br />

H<strong>in</strong>di has 99.7%, Urdu has 98.3% and Bangla has 99.8% <strong>of</strong> structures<br />

as well-nested. Despite the high coverage <strong>of</strong> well-nestedness constra<strong>in</strong>t <strong>in</strong> these<br />

languages, there are l<strong>in</strong>guistic phenomena which give rise to ill-nested structures.<br />

The almost 1% <strong>of</strong> ill-nested structures are not annotation errors but are rather l<strong>in</strong>guistically<br />

justified. Few phenomena that were observed upon close <strong>in</strong>spection<br />

<strong>of</strong> the treebanks are extraposition and topicalization <strong>of</strong> verbal arguments across<br />

clausal conjunctions. Extraposition, as a reason beh<strong>in</strong>d ill-nestedness, is also observed<br />

by [9]. Sentence (1) shows a typical ill-nested dependency analysis <strong>of</strong> a<br />

sentence from H<strong>in</strong>di treebank. In this sentence, vyakti ‘person’ <strong>in</strong> complement<br />

clause is relativized by an extraposed relative clause which conta<strong>in</strong>s a nom<strong>in</strong>al<br />

expression esa koi jawaab ‘any such answer’ relativized by another extraposed<br />

relative clause.<br />

27

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!