06.07.2014 Views

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

only the lexicon is f<strong>in</strong>e-gra<strong>in</strong>ed to study the effect <strong>of</strong> the <strong>in</strong>teraction between POS<br />

tag and phrase structure. As seen, there is a positive <strong>in</strong>teraction between them<br />

which determ<strong>in</strong>es that simplify<strong>in</strong>g the model and los<strong>in</strong>g the detailed syntactic <strong>in</strong>formation<br />

have a counter effect on pars<strong>in</strong>g. Compar<strong>in</strong>g Models 5 and 8 <strong>in</strong>dicates<br />

that reduc<strong>in</strong>g data sparsity results <strong>in</strong> a high performance. In Model 6, the lexicon<br />

and the phrase structure are coarse-gra<strong>in</strong>ed, while the POS tag is f<strong>in</strong>e-gra<strong>in</strong>ed.<br />

This model is built to study the effect <strong>of</strong> available morpho-syntactic <strong>in</strong>formation<br />

<strong>in</strong> case there is a reduction on data sparsity without the effect <strong>of</strong> the HPSG-<strong>based</strong><br />

annotation. The results <strong>of</strong> Models 2-4 <strong>in</strong>fer that class-<strong>based</strong> pars<strong>in</strong>g, the detailed<br />

morpho-syntactic <strong>in</strong>formation <strong>in</strong> the POS tags, and the coarse representation <strong>of</strong> the<br />

annotation at the phrasal level have positive impacts on pars<strong>in</strong>g. The impact <strong>of</strong><br />

these three variables are represented together <strong>in</strong> Model 6 which outperforms all the<br />

experimented models. In contrast, Model 3 which has an opposite configuration<br />

performs the worst. In Model 7, the lexicon and the POS tag are coarse-gra<strong>in</strong>ed<br />

and the phrase structure is f<strong>in</strong>e-gra<strong>in</strong>ed to study the effect <strong>of</strong> the HPSG-<strong>based</strong> annotation<br />

without the impact <strong>of</strong> the morpho-syntactic <strong>in</strong>formation but with less data<br />

sparsity. Compar<strong>in</strong>g Models 7 and 8 <strong>in</strong>dicates the negative impact <strong>of</strong> the HPSG<strong>based</strong><br />

annotation on pars<strong>in</strong>g, s<strong>in</strong>ce it is a hard task for the parser to determ<strong>in</strong>e the<br />

type <strong>of</strong> dependencies when a coarse representation <strong>of</strong> the syntactic categories is<br />

available. While a better performance is obta<strong>in</strong>ed when a f<strong>in</strong>er representation <strong>of</strong><br />

the syntactic categories is available as determ<strong>in</strong>ed <strong>in</strong> Models 2. F<strong>in</strong>ally, <strong>in</strong> Model<br />

8, the coarse-gra<strong>in</strong>ed representations <strong>of</strong> the <strong>in</strong>formation at the three dimensions are<br />

studied. Compar<strong>in</strong>g Models 1 and 8 <strong>in</strong>dicates that better results are obta<strong>in</strong>ed when<br />

there is a coarse representation <strong>of</strong> l<strong>in</strong>guistic knowledge, but higher results will be<br />

obta<strong>in</strong>ed when, similar to Model 6, a richer POS tag is used.<br />

The other observation on Table 1 is study<strong>in</strong>g the effect <strong>of</strong> each annotation dimension<br />

on all possible configurations. Compar<strong>in</strong>g Models 2 and 1, Models 6<br />

and 4, Models 7 and 3, and Models 8 and 5 show that the former models beat<br />

the latter ones which <strong>in</strong>dicates that the class-<strong>based</strong> model always outperforms the<br />

word-<strong>based</strong> model disregard<strong>in</strong>g the annotation <strong>of</strong> the POS tag and the phrase structure.<br />

There can be a similar study on the effect <strong>of</strong> POS tag annotation by compar<strong>in</strong>g<br />

Models 1 and 3, Models 2 and 7, Models 4 and 5, and Models 6 and 8. All former<br />

models outperform the latter ones which <strong>in</strong>dicates the superiority <strong>of</strong> f<strong>in</strong>e-gra<strong>in</strong>ed<br />

POS tag annotation disregard<strong>in</strong>g the lexicon and phrase structure. To study the impact<br />

<strong>of</strong> phrase structure annotation, Models 4 and 1, Models 5 and 3, Models 6 and<br />

2, and Models 8 and 7 are compared. All former models perform better than the<br />

latter ones which shows that the coarse-gra<strong>in</strong>ed phrase structure annotation always<br />

results <strong>in</strong> a higher pars<strong>in</strong>g performance disregard<strong>in</strong>g the lexicon and POS tag. It<br />

has to be mentioned that the differences between the performance <strong>of</strong> all <strong>of</strong> the eight<br />

models are statistically significant accord<strong>in</strong>g to the 2-tailed t-test (p < 0.01).<br />

4 Conclusion<br />

In this paper, we studied the effect <strong>of</strong> annotation granularity on pars<strong>in</strong>g from three<br />

dimensions (lexicon, POS tag, and phrase structure) on Persian. Compar<strong>in</strong>g the<br />

113

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!