12.09.2013 Views

Programme booklet (pdf)

Programme booklet (pdf)

Programme booklet (pdf)

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

94<br />

CLIN 21 – CONFERENCE PROGRAMME<br />

Towards a language-independent data-driven compound<br />

decomposition tool<br />

Abstract<br />

Réveil, Bert 1 and Macken, Lieve 2<br />

1 ELIS, Ghent University<br />

2 LT3, Language and Translation Technology Team<br />

Compounding is a highly productive process in Dutch that poses a challenge for various<br />

NLP applications such as terminology extraction, continuous speech recognition, and<br />

automated word alignment. The present work therefore proposes a languageindependent,<br />

data-driven decomposition tool that tries to segment compounds into<br />

their meaningful parts.<br />

The basic version of this tool initially determines a list of eligible compound<br />

constituents (so-called heads and tails), relying solely on word frequency information<br />

that is extracted from a large text corpus. The decomposition algorithm then<br />

recursively attempts to decompose the compounds, allowing only two-part head-tail<br />

divisions in each iteration. E.g. the noun 'postzegelverzamelaar' is first split into<br />

'postzegel' + 'verzamelaar', followed by an additional decomposition of 'postzegel' into<br />

'post' + 'zegel'.<br />

Apart from the basic version, an extended version of the tool is assessed that uses PoS<br />

information as a means to restrict the list of possible heads and tails. The preformance<br />

of both versions is evaluated in two large-scale decomposition experiments, one on the<br />

E-lex compound list and one on a word list that contains specific vocabulary from the<br />

automotive domain. As the presented decomposition tool only relates on word<br />

frequency and PoS information, it is expected that the tool can be easily adapted to<br />

new domains and languages.<br />

Corresponding author: breveil@elis.ugent.be

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!