06.07.2014 Views

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

. daar hebben we toen geprobeerd te bellen ...<br />

there have we then try-PSP to call<br />

‘Then we have tried to call there’ (fna000260__277)<br />

By means <strong>of</strong> a treebank-<strong>based</strong> <strong>in</strong>vestigation this paper aims to set up a typology <strong>of</strong><br />

<strong>Dutch</strong> <strong>IPP</strong>-<strong>trigger<strong>in</strong>g</strong> verbs. The results are used to verify the exist<strong>in</strong>g descriptive<br />

approaches as well as to ga<strong>in</strong> more <strong>in</strong>sight <strong>in</strong> (the frequency <strong>of</strong>) the phenomenon.<br />

Furthermore, the results turn out to be useful for the creation and optimization<br />

<strong>of</strong> NLP applications, s<strong>in</strong>ce most verbs under <strong>in</strong>vestigation also have a ma<strong>in</strong> verb<br />

function besides their verb-select<strong>in</strong>g function.<br />

We start from the division <strong>in</strong>to subject rais<strong>in</strong>g, object rais<strong>in</strong>g, subject control,<br />

and object control verbs, follow<strong>in</strong>g Sag et al. [6]. Subject rais<strong>in</strong>g verbs, such as<br />

kunnen ‘can’ <strong>in</strong> (2), do not assign a semantic role to their subject, whereas subject<br />

control verbs, such as proberen ‘try’ <strong>in</strong> (3), do. 1 Similarly, object rais<strong>in</strong>g verbs,<br />

such as the perception verb zien ‘see’, do not assign a semantic role to their object,<br />

whereas object control verbs, such as vragen ‘ask’ <strong>in</strong> (1), do.<br />

2 Data and Methodology<br />

In order to get a data-driven classification <strong>of</strong> <strong>IPP</strong>-triggers, we have explored two<br />

manually corrected treebanks: LASSY Small (van Noord et al. [8]) for written<br />

<strong>Dutch</strong>, and the syntactically annotated part <strong>of</strong> CGN (Hoekstra et al. [4]) for spoken<br />

<strong>Dutch</strong>. Each <strong>of</strong> those treebanks conta<strong>in</strong>s ca. one million tokens. Us<strong>in</strong>g treebanks<br />

<strong>in</strong>stead <strong>of</strong> ‘flat’ corpora is <strong>in</strong>terest<strong>in</strong>g for this topic, s<strong>in</strong>ce it is possible to look for<br />

syntactic dependency relations without def<strong>in</strong><strong>in</strong>g a specific word order. In ma<strong>in</strong><br />

clauses the f<strong>in</strong>ite verb and the f<strong>in</strong>al verb cluster are not necessarily adjacent, which<br />

makes such constructions hard to retrieve us<strong>in</strong>g str<strong>in</strong>g-<strong>based</strong> search.<br />

Both treebanks can be queried with XPath, a W3C standard for query<strong>in</strong>g XML<br />

trees. 2 The tools used for treebank m<strong>in</strong><strong>in</strong>g are GrETEL (August<strong>in</strong>us et al. [1]) and<br />

Dact (de Kok [2]). 3 Exploratory research, such as f<strong>in</strong>d<strong>in</strong>g relevant constructions<br />

and build<strong>in</strong>g general queries (XPath expressions) was done by means <strong>of</strong> GrETEL,<br />

a user-friendly query eng<strong>in</strong>e. In a follow<strong>in</strong>g step, Dact was used to ref<strong>in</strong>e the<br />

queries and to exam<strong>in</strong>e the data <strong>in</strong> detail.<br />

Figure 1 presents two general queries. In order to retrieve <strong>IPP</strong>-triggers, we<br />

looked for constructions with a form <strong>of</strong> hebben or zijn (perfect tense auxiliaries) as<br />

head (HD) that have an <strong>in</strong>f<strong>in</strong>itival verbal complement (VC) as sister node. That VC<br />

node conta<strong>in</strong>s the <strong>IPP</strong>-trigger as well as another VC, which can take on the form<br />

<strong>of</strong> a bare <strong>in</strong>f<strong>in</strong>itive or a te-<strong>in</strong>f<strong>in</strong>itive, cf. Figure 1a. We also looked for similar<br />

constructions without <strong>IPP</strong>, i.e. constructions with an auxiliary <strong>of</strong> the perfect, a past<br />

1 One way to differentiate subject rais<strong>in</strong>g from subject control verbs is test<strong>in</strong>g whether the verbs<br />

can comb<strong>in</strong>e with non-referential subjects (e.g. Het beg<strong>in</strong>t te regenen ‘It starts to ra<strong>in</strong>’ vs. *Het<br />

probeert te regenen ‘*It tries to ra<strong>in</strong>’).<br />

2 http://www.w3.org/TR/xpath<br />

3 http://nederbooms.ccl.kuleuven.be/eng/gretel; http://rug-compl<strong>in</strong>g.github.com/dact<br />

8

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!