Programme booklet (pdf)
Programme booklet (pdf)
Programme booklet (pdf)
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
74<br />
CLIN 21 – CONFERENCE PROGRAMME<br />
Using easy distributed computing for data-intensive<br />
processing<br />
Abstract<br />
Van den Bogaert, Joachim<br />
Centre for Computational Linguistics, K.U. Leuven<br />
Given the large amounts of data we are coping with when computing useful data from<br />
large corpora, and the difficulties and costs it takes to run parallel code with traditional<br />
parallel computing, we will present different frameworks that may be used to facilitate<br />
easy distributed computing. Using string-to-tree alignment (GHKM), frequent subtree<br />
mining, and distributed Moses decoding as example cases, we will demonstrate how<br />
applications and algorithms may be upscaled and out-scaled with these frameworks.<br />
We will consider both the creation of an embarrassingly parallel solution and the redesign<br />
of an existing algorithm to fit the mapreduce paradigm.<br />
Corresponding author: joachim@ccl.kuleuven.be