06.07.2014 Views

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

In this paper, we discuss the constra<strong>in</strong>ts and measures evaluated by [8], [12].<br />

We evaluate these measures on IL treebanks, follow<strong>in</strong>g with the adequate l<strong>in</strong>guistic<br />

description <strong>of</strong> non-projective structures, focus<strong>in</strong>g on the identification and categorization<br />

<strong>of</strong> grammatical structures that can readily undergo non-projectivity and the<br />

possible reasons for the same.<br />

The paper is organized as follows: In Section 2, we give an overview <strong>of</strong> Indian<br />

Language <strong>Treebank</strong><strong>in</strong>g with reference to the treebanks used <strong>in</strong> this work. Section<br />

3 discusses different constra<strong>in</strong>ts on dependency trees followed by the empirical<br />

results <strong>of</strong> our experiments <strong>in</strong> Section 4. In Section 5, we present an <strong>in</strong> depth analysis<br />

<strong>of</strong> non-projective structures approved by Indian Languages. F<strong>in</strong>ally Section 6<br />

concludes the paper.<br />

2 <strong>Treebank</strong>s<br />

In our analysis <strong>of</strong> non-projective structures <strong>in</strong> Indian languages, we use treebanks<br />

<strong>of</strong> four languages namely H<strong>in</strong>di, Urdu, Bangla and Telugu. These treebanks are<br />

currently be<strong>in</strong>g developed follow<strong>in</strong>g the annotation scheme <strong>based</strong> on the Computational<br />

Pan<strong>in</strong>ian Grammar (CPG) [1]. The dependency relations <strong>in</strong> these treebanks,<br />

under this framework, are marked between chunks. A chunk is a m<strong>in</strong>imal,<br />

non-recursive structure consist<strong>in</strong>g <strong>of</strong> a group <strong>of</strong> closely related words. Thus, <strong>in</strong><br />

these treebanks a node <strong>in</strong> a dependency tree is represented by a chunk and not by a<br />

word. Table 1 gives an overview <strong>of</strong> the four above mentioned treebanks. While the<br />

H<strong>in</strong>di treebank<strong>in</strong>g effort has matured and grown considerably [2] the other three<br />

treebanks are still at an <strong>in</strong>itial stage <strong>of</strong> development. Because <strong>of</strong> the large size and<br />

stable annotations, H<strong>in</strong>di treebank provides major <strong>in</strong>sights <strong>in</strong>to potential sites <strong>of</strong><br />

non-projectivity. In our work we have ignored <strong>in</strong>tra-chunk dependencies for two<br />

reasons 1) currently <strong>in</strong>tra-chunk dependencies are not be<strong>in</strong>g marked <strong>in</strong> the treebanks,<br />

and 2) <strong>in</strong>tra-chunk dependencies are projective; all the non-projective edges are<br />

distributed among the <strong>in</strong>ter-chunk relations (as is the case with H<strong>in</strong>di [10]).<br />

Language Sentences Words / Sentences Chunks /Sentences<br />

H<strong>in</strong>di 20705 20.8 10.7<br />

Urdu 3226 29.1 13.7<br />

Bangla 1279 9.5 6.4<br />

Telugu 1635 9.4 3.9<br />

Table 1: IL TREEBANK STATISTICS<br />

In comparison to other free word order languages like Czech and Danish which<br />

have non-projectivity <strong>in</strong> 23% (out <strong>of</strong> 73088 sentences) and 15% (out <strong>of</strong> 4393 sentences)<br />

respectively [8], [5], Indian languages show <strong>in</strong>terest<strong>in</strong>g figures, Urdu has<br />

highest number <strong>of</strong> non-projective sentences, out <strong>of</strong> 3226 sentences 23% are nonprojective,<br />

<strong>in</strong> H<strong>in</strong>di the number drops to 15% out <strong>of</strong> 20705 sentences, <strong>in</strong> Bangla<br />

5% <strong>of</strong> 1279 sentences and <strong>in</strong>terest<strong>in</strong>gly there are no non-projective dependency<br />

structures <strong>in</strong> Telugu treebank.<br />

3 Dependency Graph and its properties<br />

In this section, we give a formal def<strong>in</strong>ition <strong>of</strong> dependency tree, and subsequently<br />

def<strong>in</strong>e different constra<strong>in</strong>ts on these dependency trees like projectivity, planarity<br />

and well-nestedness.<br />

Dependency Tree : A dependency tree D = (V, E, ≼) is a directed graph with V a set<br />

<strong>of</strong> nodes, E a set <strong>of</strong> edges show<strong>in</strong>g a dependency relation on V, and ≼ l<strong>in</strong>ear order on V.<br />

Every dependency tree satisfies two properties : a) it is acyclic, and b) all nodes<br />

have <strong>in</strong>-degree 1, except root node with <strong>in</strong>-degree 0.<br />

3.1 Condition <strong>of</strong> Projectivity<br />

Condition <strong>of</strong> projectivity <strong>in</strong> contrast to acyclicity and <strong>in</strong>-degree concerns the <strong>in</strong>teraction<br />

between the dependency relations and the projection <strong>of</strong> a these relations on<br />

26

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!