06.07.2014 Views

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

gerundives for a total amount <strong>of</strong> some 12,000 non co<strong>in</strong>dexed null elements<br />

over some 38,000 Null Elements. This problem has also prevented other<br />

attempts at produc<strong>in</strong>g a semantically viable corpus <strong>of</strong> logical forms directly<br />

from a mapp<strong>in</strong>g <strong>of</strong> PTB, by a number <strong>of</strong> other researchers work<strong>in</strong>g <strong>in</strong> the<br />

LFG framework, (Guo et al., 2007) and <strong>in</strong> HPSG and CCG frameworks, but<br />

also Dependency Grammar as reported <strong>in</strong> (Nivre and Nilsson, 2005).<br />

In Branco 2009, the author reviews a possible annotation process for a yet<br />

to be constructed resource, which is correctly regarded, the “next generation<br />

<strong>of</strong> semantically annotated corpora” (ibid.6). However, s<strong>in</strong>ce the author does<br />

not make any reference to real exist<strong>in</strong>g resources, the whole discussion<br />

rema<strong>in</strong>s very theoretical. In a subsequent paper (Branco et al. 2012), the same<br />

author presents a parser for the construction <strong>of</strong> what he calls “deep l<strong>in</strong>guistic<br />

databank, called CINTIL DeepGramBamk” (ibid, 1810). In fact, the authors<br />

depict the process <strong>of</strong> creat<strong>in</strong>g a Logical Form as a side effect,<br />

“As a side effect, it permits to obta<strong>in</strong> very important pay<strong>of</strong>fs: as<br />

the deep l<strong>in</strong>guistic representation <strong>of</strong> a sentence may encode as<br />

much grammatical <strong>in</strong>formation as it is viable to associate to a<br />

sentence, by construct<strong>in</strong>g a deep l<strong>in</strong>guistic databank one is<br />

produc<strong>in</strong>g <strong>in</strong> tandem, and with<strong>in</strong> the same amount <strong>of</strong> effort, a<br />

POS-tagged corpus, a constituency TreeBank, a DependencyBank,<br />

a PropBank, or even a LogicalFormBank.”<br />

This is clearly an underestimation <strong>of</strong> the real problem that has to be solved<br />

when mov<strong>in</strong>g from a constituency structure-<strong>based</strong> representation to other<br />

levels <strong>of</strong> representation, where additional <strong>in</strong>formation needs to be added, as<br />

we will discuss below. In the two papers by Branco quoted above, the authors<br />

never refer to exist<strong>in</strong>g Logical Form resources, as if there was no other effort<br />

<strong>in</strong> that direction done and accomplished by others.<br />

All these methods go beyond the encod<strong>in</strong>g <strong>of</strong> surface context-free phrase<br />

structure trees, to <strong>in</strong>corporate non-local dependencies. This option requires<br />

recover<strong>in</strong>g empty nodes and identify<strong>in</strong>g their antecedents, be they traces or<br />

long distance dependencies. But s<strong>in</strong>ce PTB annotators themselves<br />

<strong>in</strong>tentionally refused to co<strong>in</strong>dex all those cases that caused some difficulty <strong>in</strong><br />

the decision process, all work carried out on this resource is flawed,<br />

semantically speak<strong>in</strong>g, from the start. However, I must admit to the fact that<br />

WN glosses are much simpler sentences <strong>in</strong> comparison to PTB sentences,<br />

which even if taken with a word limit under 40 are still too complex and not<br />

comparable to def<strong>in</strong>itions.<br />

In a previous paper(Delmonte & Rotondi, 2012) I revised the typical<br />

mistakes present <strong>in</strong> the corpus and commented on them; I also compared<br />

XWN with the representation conta<strong>in</strong>ed <strong>in</strong> other similar resources. In this<br />

paper I will limit myself to XWN and I will extend the previous analysis. In<br />

particular, <strong>in</strong> section 2 below I will <strong>in</strong>troduce and comment at length the<br />

thorny problem <strong>of</strong> represent<strong>in</strong>g three-place predicates <strong>in</strong> LF. Then I will add<br />

some conclusion.<br />

2 The Problem <strong>of</strong> Three-Place Predicates and Their<br />

Representation <strong>in</strong> LF<br />

Logical Forms <strong>in</strong> XWN are graded <strong>in</strong> three quality levels: normal, silver and<br />

gold; the same applies to tagg<strong>in</strong>g and phrase structure constituency. "Normal"<br />

75

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!