A Treebank-based Investigation of IPP-triggering Verbs in Dutch
A Treebank-based Investigation of IPP-triggering Verbs in Dutch
A Treebank-based Investigation of IPP-triggering Verbs in Dutch
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Genitives <strong>in</strong> H<strong>in</strong>di <strong>Treebank</strong>: An Attempt for<br />
Automatic Annotation<br />
Nitesh Surtani, Soma Paul<br />
Language Technologies Research Centre<br />
IIIT Hyderabad<br />
Hyderabad, Andhra Pradesh-500032<br />
nitesh.surtaniug08@students.iiit.ac.<strong>in</strong>, soma@iiit.ac.<strong>in</strong><br />
Abstract<br />
Build<strong>in</strong>g syntactic <strong>Treebank</strong> manually is a time consum<strong>in</strong>g and human labor<br />
<strong>in</strong>tensive task. The correctness <strong>of</strong> annotated data is very important because this<br />
resource can be used for develop<strong>in</strong>g important NLP tools such as syntactic parsers.<br />
In this paper, we exam<strong>in</strong>e genitive construction <strong>in</strong> H<strong>in</strong>di <strong>Treebank</strong> with a view <strong>of</strong><br />
develop<strong>in</strong>g a set <strong>of</strong> rules for automatic annotation <strong>of</strong> genitive data <strong>in</strong> H<strong>in</strong>di<br />
<strong>Treebank</strong>. The rules perform quite well produc<strong>in</strong>g an overall 89% accuracy for right<br />
attachment <strong>of</strong> genitive noun with its head and correct label<strong>in</strong>g for the attachment.<br />
1 Introduction<br />
A syntactically annotated <strong>Treebank</strong> is a highly useful language resource for<br />
many NLP tasks <strong>in</strong>clud<strong>in</strong>g pars<strong>in</strong>g, grammar <strong>in</strong>duction to name a few.<br />
Generally, build<strong>in</strong>g a <strong>Treebank</strong> requires an enormous effort by the annotators.<br />
But some constructions <strong>in</strong> <strong>Treebank</strong> can be automatically annotated. This on<br />
one hand reduces the human effort by decreas<strong>in</strong>g the number <strong>of</strong> <strong>in</strong>tervention<br />
required by the annotator, and on other hand helps to ma<strong>in</strong>ta<strong>in</strong> consistent<br />
annotation. For the automatic annotation <strong>of</strong> the data, 3 types <strong>of</strong> cases exist: (1)<br />
constructions that have a unique cue that identifies it accurately; (2)<br />
construction which occur <strong>in</strong> varied contexts but still can be identified<br />
accurately with well-designed rules; and (3) constructions that cannot be<br />
handled us<strong>in</strong>g cues. Case 2 constructions are the <strong>in</strong>terest<strong>in</strong>g ones which<br />
require special attention for their automatic annotation. Genitive construction<br />
<strong>in</strong> H<strong>in</strong>di is one such <strong>in</strong>terest<strong>in</strong>g construction that occurs <strong>in</strong> varied contexts.<br />
Though, noun with genitive case marker generally modifies a noun, it is<br />
also found to occur <strong>in</strong> other contexts <strong>in</strong>clud<strong>in</strong>g <strong>in</strong> relation with verbs, with<br />
complex predicates etc. In this paper, we will exam<strong>in</strong>e the distribution <strong>of</strong><br />
genitive data <strong>in</strong> H<strong>in</strong>di dependency <strong>Treebank</strong>. The aim is to study syntactic<br />
cues from the <strong>Treebank</strong> for determ<strong>in</strong><strong>in</strong>g the legitimate head <strong>of</strong> the genitive<br />
modifier and also identify the relation between the two. We implement the<br />
cues as rules for predict<strong>in</strong>g the correct attachment between genitive noun and<br />
its head. This is an attempt towards develop<strong>in</strong>g semi-automatic annotation <strong>of</strong><br />
<strong>Treebank</strong> for the genitive data.<br />
The paper is divided as follows: A detailed study <strong>of</strong> genitive data <strong>in</strong> H<strong>in</strong>di<br />
has been carried out <strong>in</strong> section 2. Section 3 presents a brief overview <strong>of</strong> H<strong>in</strong>di<br />
<strong>Treebank</strong> and Section 4 talks about distribution <strong>of</strong> genitives <strong>in</strong> H<strong>in</strong>di<br />
<strong>Treebank</strong>. Section 5 then discusses a rule <strong>based</strong> approach for automatic<br />
annotation <strong>of</strong> genitive data. The Results and Observation are presented <strong>in</strong><br />
Section 6 and Section 7 concludes the paper.<br />
199