Programme booklet (pdf)
Programme booklet (pdf)
Programme booklet (pdf)
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
94<br />
CLIN 21 – CONFERENCE PROGRAMME<br />
Towards a language-independent data-driven compound<br />
decomposition tool<br />
Abstract<br />
Réveil, Bert 1 and Macken, Lieve 2<br />
1 ELIS, Ghent University<br />
2 LT3, Language and Translation Technology Team<br />
Compounding is a highly productive process in Dutch that poses a challenge for various<br />
NLP applications such as terminology extraction, continuous speech recognition, and<br />
automated word alignment. The present work therefore proposes a languageindependent,<br />
data-driven decomposition tool that tries to segment compounds into<br />
their meaningful parts.<br />
The basic version of this tool initially determines a list of eligible compound<br />
constituents (so-called heads and tails), relying solely on word frequency information<br />
that is extracted from a large text corpus. The decomposition algorithm then<br />
recursively attempts to decompose the compounds, allowing only two-part head-tail<br />
divisions in each iteration. E.g. the noun 'postzegelverzamelaar' is first split into<br />
'postzegel' + 'verzamelaar', followed by an additional decomposition of 'postzegel' into<br />
'post' + 'zegel'.<br />
Apart from the basic version, an extended version of the tool is assessed that uses PoS<br />
information as a means to restrict the list of possible heads and tails. The preformance<br />
of both versions is evaluated in two large-scale decomposition experiments, one on the<br />
E-lex compound list and one on a word list that contains specific vocabulary from the<br />
automotive domain. As the presented decomposition tool only relates on word<br />
frequency and PoS information, it is expected that the tool can be easily adapted to<br />
new domains and languages.<br />
Corresponding author: breveil@elis.ugent.be