A Treebank-based Investigation of IPP-triggering Verbs in Dutch
A Treebank-based Investigation of IPP-triggering Verbs in Dutch
A Treebank-based Investigation of IPP-triggering Verbs in Dutch
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
DeepBank: A Dynamically Annotated <strong>Treebank</strong> <strong>of</strong><br />
the Wall Street Journal<br />
Dan Flick<strong>in</strong>ger † , Yi Zhang ‡♯ , Valia Kordoni ⋄♯<br />
† CSLI, Stanford University<br />
‡ Department <strong>of</strong> Computational L<strong>in</strong>guistics, Saarland University, Germany<br />
♯ LT-Lab, German Research Center for Artificial Intelligence<br />
⋄ Department <strong>of</strong> English Studies, Humboldt University <strong>of</strong> Berl<strong>in</strong><br />
E-mail: danf@stanford.edu, yizhang@dfki.de,<br />
kordonie@anglistik.hu-berl<strong>in</strong>.de<br />
Abstract<br />
This paper describes a large on-go<strong>in</strong>g effort, near<strong>in</strong>g completion, which aims<br />
to annotate the text <strong>of</strong> all <strong>of</strong> the 25 Wall Street Journal sections <strong>in</strong>cluded<br />
<strong>in</strong> the Penn <strong>Treebank</strong>, us<strong>in</strong>g a hand-written broad-coverage grammar <strong>of</strong> English,<br />
manual disambiguation, and a PCFG approximation for the sentences<br />
not yet successfully analyzed by the grammar. These grammar-<strong>based</strong> annotations<br />
are l<strong>in</strong>guistically rich, <strong>in</strong>clud<strong>in</strong>g both f<strong>in</strong>e-gra<strong>in</strong>ed syntactic structures<br />
grounded <strong>in</strong> the Head-driven Phrase Structure Grammar framework, as well<br />
as logically sound semantic representations expressed <strong>in</strong> M<strong>in</strong>imal Recursion<br />
Semantics. The l<strong>in</strong>guistic depth <strong>of</strong> these annotations on a large and familiar<br />
corpus should enable a variety <strong>of</strong> NLP-related tasks, <strong>in</strong>clud<strong>in</strong>g more direct<br />
comparison <strong>of</strong> grammars and parsers across frameworks, identification<br />
<strong>of</strong> sentences exhibit<strong>in</strong>g l<strong>in</strong>guistically <strong>in</strong>terest<strong>in</strong>g phenomena, and tra<strong>in</strong><strong>in</strong>g <strong>of</strong><br />
more accurate robust parsers and parse-rank<strong>in</strong>g models that will also perform<br />
well on texts <strong>in</strong> other doma<strong>in</strong>s.<br />
1 Introduction<br />
This paper presents the English DeepBank, an on-go<strong>in</strong>g project whose aim is to<br />
produce rich syntactic and semantic annotations for the 25 Wall Street Journal<br />
(WSJ) sections <strong>in</strong>cluded <strong>in</strong> the Penn <strong>Treebank</strong> (PTB: [16]). The annotations are<br />
for the most part produced by manual disambiguation <strong>of</strong> parses licensed by the<br />
English Resource Grammar (ERG: [10]), which is a hand-written, broad-coverage<br />
grammar for English <strong>in</strong> the framework <strong>of</strong> Head-driven Phrase Structure Grammar<br />
(HPSG: [19]).<br />
Large-scale full syntactic annotation has for quite some time been approached<br />
with mixed feel<strong>in</strong>gs by researchers. On the one hand, detailed syntactic annotation<br />
can serve as a basis for corpus-l<strong>in</strong>guistic study and improved data-driven NLP<br />
85