A Treebank-based Investigation of IPP-triggering Verbs in Dutch
A Treebank-based Investigation of IPP-triggering Verbs in Dutch
A Treebank-based Investigation of IPP-triggering Verbs in Dutch
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
gerundives for a total amount <strong>of</strong> some 12,000 non co<strong>in</strong>dexed null elements<br />
over some 38,000 Null Elements. This problem has also prevented other<br />
attempts at produc<strong>in</strong>g a semantically viable corpus <strong>of</strong> logical forms directly<br />
from a mapp<strong>in</strong>g <strong>of</strong> PTB, by a number <strong>of</strong> other researchers work<strong>in</strong>g <strong>in</strong> the<br />
LFG framework, (Guo et al., 2007) and <strong>in</strong> HPSG and CCG frameworks, but<br />
also Dependency Grammar as reported <strong>in</strong> (Nivre and Nilsson, 2005).<br />
In Branco 2009, the author reviews a possible annotation process for a yet<br />
to be constructed resource, which is correctly regarded, the “next generation<br />
<strong>of</strong> semantically annotated corpora” (ibid.6). However, s<strong>in</strong>ce the author does<br />
not make any reference to real exist<strong>in</strong>g resources, the whole discussion<br />
rema<strong>in</strong>s very theoretical. In a subsequent paper (Branco et al. 2012), the same<br />
author presents a parser for the construction <strong>of</strong> what he calls “deep l<strong>in</strong>guistic<br />
databank, called CINTIL DeepGramBamk” (ibid, 1810). In fact, the authors<br />
depict the process <strong>of</strong> creat<strong>in</strong>g a Logical Form as a side effect,<br />
“As a side effect, it permits to obta<strong>in</strong> very important pay<strong>of</strong>fs: as<br />
the deep l<strong>in</strong>guistic representation <strong>of</strong> a sentence may encode as<br />
much grammatical <strong>in</strong>formation as it is viable to associate to a<br />
sentence, by construct<strong>in</strong>g a deep l<strong>in</strong>guistic databank one is<br />
produc<strong>in</strong>g <strong>in</strong> tandem, and with<strong>in</strong> the same amount <strong>of</strong> effort, a<br />
POS-tagged corpus, a constituency TreeBank, a DependencyBank,<br />
a PropBank, or even a LogicalFormBank.”<br />
This is clearly an underestimation <strong>of</strong> the real problem that has to be solved<br />
when mov<strong>in</strong>g from a constituency structure-<strong>based</strong> representation to other<br />
levels <strong>of</strong> representation, where additional <strong>in</strong>formation needs to be added, as<br />
we will discuss below. In the two papers by Branco quoted above, the authors<br />
never refer to exist<strong>in</strong>g Logical Form resources, as if there was no other effort<br />
<strong>in</strong> that direction done and accomplished by others.<br />
All these methods go beyond the encod<strong>in</strong>g <strong>of</strong> surface context-free phrase<br />
structure trees, to <strong>in</strong>corporate non-local dependencies. This option requires<br />
recover<strong>in</strong>g empty nodes and identify<strong>in</strong>g their antecedents, be they traces or<br />
long distance dependencies. But s<strong>in</strong>ce PTB annotators themselves<br />
<strong>in</strong>tentionally refused to co<strong>in</strong>dex all those cases that caused some difficulty <strong>in</strong><br />
the decision process, all work carried out on this resource is flawed,<br />
semantically speak<strong>in</strong>g, from the start. However, I must admit to the fact that<br />
WN glosses are much simpler sentences <strong>in</strong> comparison to PTB sentences,<br />
which even if taken with a word limit under 40 are still too complex and not<br />
comparable to def<strong>in</strong>itions.<br />
In a previous paper(Delmonte & Rotondi, 2012) I revised the typical<br />
mistakes present <strong>in</strong> the corpus and commented on them; I also compared<br />
XWN with the representation conta<strong>in</strong>ed <strong>in</strong> other similar resources. In this<br />
paper I will limit myself to XWN and I will extend the previous analysis. In<br />
particular, <strong>in</strong> section 2 below I will <strong>in</strong>troduce and comment at length the<br />
thorny problem <strong>of</strong> represent<strong>in</strong>g three-place predicates <strong>in</strong> LF. Then I will add<br />
some conclusion.<br />
2 The Problem <strong>of</strong> Three-Place Predicates and Their<br />
Representation <strong>in</strong> LF<br />
Logical Forms <strong>in</strong> XWN are graded <strong>in</strong> three quality levels: normal, silver and<br />
gold; the same applies to tagg<strong>in</strong>g and phrase structure constituency. "Normal"<br />
75