06.07.2014 Views

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Genitives <strong>in</strong> H<strong>in</strong>di <strong>Treebank</strong>: An Attempt for<br />

Automatic Annotation<br />

Nitesh Surtani, Soma Paul<br />

Language Technologies Research Centre<br />

IIIT Hyderabad<br />

Hyderabad, Andhra Pradesh-500032<br />

nitesh.surtaniug08@students.iiit.ac.<strong>in</strong>, soma@iiit.ac.<strong>in</strong><br />

Abstract<br />

Build<strong>in</strong>g syntactic <strong>Treebank</strong> manually is a time consum<strong>in</strong>g and human labor<br />

<strong>in</strong>tensive task. The correctness <strong>of</strong> annotated data is very important because this<br />

resource can be used for develop<strong>in</strong>g important NLP tools such as syntactic parsers.<br />

In this paper, we exam<strong>in</strong>e genitive construction <strong>in</strong> H<strong>in</strong>di <strong>Treebank</strong> with a view <strong>of</strong><br />

develop<strong>in</strong>g a set <strong>of</strong> rules for automatic annotation <strong>of</strong> genitive data <strong>in</strong> H<strong>in</strong>di<br />

<strong>Treebank</strong>. The rules perform quite well produc<strong>in</strong>g an overall 89% accuracy for right<br />

attachment <strong>of</strong> genitive noun with its head and correct label<strong>in</strong>g for the attachment.<br />

1 Introduction<br />

A syntactically annotated <strong>Treebank</strong> is a highly useful language resource for<br />

many NLP tasks <strong>in</strong>clud<strong>in</strong>g pars<strong>in</strong>g, grammar <strong>in</strong>duction to name a few.<br />

Generally, build<strong>in</strong>g a <strong>Treebank</strong> requires an enormous effort by the annotators.<br />

But some constructions <strong>in</strong> <strong>Treebank</strong> can be automatically annotated. This on<br />

one hand reduces the human effort by decreas<strong>in</strong>g the number <strong>of</strong> <strong>in</strong>tervention<br />

required by the annotator, and on other hand helps to ma<strong>in</strong>ta<strong>in</strong> consistent<br />

annotation. For the automatic annotation <strong>of</strong> the data, 3 types <strong>of</strong> cases exist: (1)<br />

constructions that have a unique cue that identifies it accurately; (2)<br />

construction which occur <strong>in</strong> varied contexts but still can be identified<br />

accurately with well-designed rules; and (3) constructions that cannot be<br />

handled us<strong>in</strong>g cues. Case 2 constructions are the <strong>in</strong>terest<strong>in</strong>g ones which<br />

require special attention for their automatic annotation. Genitive construction<br />

<strong>in</strong> H<strong>in</strong>di is one such <strong>in</strong>terest<strong>in</strong>g construction that occurs <strong>in</strong> varied contexts.<br />

Though, noun with genitive case marker generally modifies a noun, it is<br />

also found to occur <strong>in</strong> other contexts <strong>in</strong>clud<strong>in</strong>g <strong>in</strong> relation with verbs, with<br />

complex predicates etc. In this paper, we will exam<strong>in</strong>e the distribution <strong>of</strong><br />

genitive data <strong>in</strong> H<strong>in</strong>di dependency <strong>Treebank</strong>. The aim is to study syntactic<br />

cues from the <strong>Treebank</strong> for determ<strong>in</strong><strong>in</strong>g the legitimate head <strong>of</strong> the genitive<br />

modifier and also identify the relation between the two. We implement the<br />

cues as rules for predict<strong>in</strong>g the correct attachment between genitive noun and<br />

its head. This is an attempt towards develop<strong>in</strong>g semi-automatic annotation <strong>of</strong><br />

<strong>Treebank</strong> for the genitive data.<br />

The paper is divided as follows: A detailed study <strong>of</strong> genitive data <strong>in</strong> H<strong>in</strong>di<br />

has been carried out <strong>in</strong> section 2. Section 3 presents a brief overview <strong>of</strong> H<strong>in</strong>di<br />

<strong>Treebank</strong> and Section 4 talks about distribution <strong>of</strong> genitives <strong>in</strong> H<strong>in</strong>di<br />

<strong>Treebank</strong>. Section 5 then discusses a rule <strong>based</strong> approach for automatic<br />

annotation <strong>of</strong> genitive data. The Results and Observation are presented <strong>in</strong><br />

Section 6 and Section 7 concludes the paper.<br />

199

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!