The current state of work on the Polish-Ukrainian Parallel Corpus
The current state of work on the Polish-Ukrainian Parallel Corpus
The current state of work on the Polish-Ukrainian Parallel Corpus
Transform your PDFs into Flipbooks and boost your revenue!
Leverage SEO-optimized Flipbooks, powerful backlinks, and multimedia content to professionally showcase your products and significantly increase your reach.
<str<strong>on</strong>g>The</str<strong>on</strong>g> <str<strong>on</strong>g>current</str<strong>on</strong>g> <str<strong>on</strong>g>state</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> PolUKR enables already searching for translati<strong>on</strong> equivalent and can be used as atranslati<strong>on</strong> memory database both by human translators and researchers and machines. But <strong>the</strong>corpus can be enhanced in a number <str<strong>on</strong>g>of</str<strong>on</strong>g> ways, like finer alignment level, enriching with fur<strong>the</strong>rannotati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> different types, including also semantic and referential informati<strong>on</strong>. Automatic wordlevelalignment can be <str<strong>on</strong>g>of</str<strong>on</strong>g> significant help while compiling bilingual dicti<strong>on</strong>aries. <str<strong>on</strong>g>The</str<strong>on</strong>g> search enginehas to be adjusted to enable searching for <strong>the</strong> new informati<strong>on</strong> as well.LiteratureBroda Bartosz, Piasecki Maciej & Radziszewski Adam. Towards a Set <str<strong>on</strong>g>of</str<strong>on</strong>g> General PurposeMorphosyntactic Tools for <strong>Polish</strong>. Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> Intelligent Informati<strong>on</strong> Systems, Zakopane Poland,2008. Institute <str<strong>on</strong>g>of</str<strong>on</strong>g> Computer Science PAS, 2008.Ivan Derzhanski and Natalia Kotsyba. <str<strong>on</strong>g>The</str<strong>on</strong>g> Category <str<strong>on</strong>g>of</str<strong>on</strong>g> Predicatives in <strong>the</strong> Light <str<strong>on</strong>g>of</str<strong>on</strong>g> C<strong>on</strong>sistentMorphosyntactic Tagging <str<strong>on</strong>g>of</str<strong>on</strong>g> Slavic Languages. Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>the</strong> Internati<strong>on</strong>al Workshopwithin MONDILEX project, Moscow, 2-4 October 2008.Hunalign - sentence level aligner: http://mokk.bme.hu/resources/hunalign.Natalia Kotsyba, Olha Shypnivska and Magdalena Turska. Linguistic principles <str<strong>on</strong>g>of</str<strong>on</strong>g> organizing acomm<strong>on</strong> morphological tagset for PolUKR (<strong>Polish</strong>-<strong>Ukrainian</strong> <strong>Parallel</strong> <strong>Corpus</strong>). Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g>Intelligent Informati<strong>on</strong> Systems, Zakopane, Poland, 2008. Institute <str<strong>on</strong>g>of</str<strong>on</strong>g> Computer Science PAS, 2008.Adam Przepiórkowski and Marcin Woliński. A Flexemic Tagset for <strong>Polish</strong>. In: <str<strong>on</strong>g>The</str<strong>on</strong>g> Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>the</strong>Workshop <strong>on</strong> Morphological Processing <str<strong>on</strong>g>of</str<strong>on</strong>g> Slavic Languages, EACL 2003.http://nlp.ipipan.waw.pl/~adamp/Papers/2003-eacl-ws12/ws12.pdfMichał Rudolf. Metody automatycznej analizy korpusu tekstów polskich. Pozyskiwanie, wzbogacanie iprzetwarzanie informacji lingwistycznych. Warszawa, 2004.TextAlign in MT2007 (Memory Translati<strong>on</strong> Computer Aided Tool): http://mt2007-cat.ru/index.html.Magdalena Turska and Natalia Kotsyba. Polsko-Ukraiński korpus równoległy (PolUKR). „Materiały LXIIIZjazdu Polskiego Towarzystwa Językoznawczego”, Warszawa.Magdalena Turska and Natalia Kotsyba. <strong>Polish</strong>-<strong>Ukrainian</strong> <strong>Parallel</strong> <strong>Corpus</strong> and its Possible Applicati<strong>on</strong>s,Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>the</strong> Internati<strong>on</strong>al C<strong>on</strong>ference "Practical Applicati<strong>on</strong>s in Language and Computers, 7-9April, Łódź", Peter Lang GmbH, 2007.v. Waldenfels, R. Compiling a parallel corpus <str<strong>on</strong>g>of</str<strong>on</strong>g> slavic languages. Text strategies, tools and <strong>the</strong>questi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> lemmatizati<strong>on</strong> in alignment. In: Brehmer, B., Zdanova, V., Zimny, R. (Hrsg.); Beiträge derEuropäischen Slavistischen Linguistik (POLYSLAV) 9. München, 123-138, 2006.Коциба Наталія. Принципи морфосинтактичного таґування польсько-українськогопаралельного корпусу (PolUKR). Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>the</strong> Internati<strong>on</strong>al C<strong>on</strong>ference “MegaLing'2008.Horiz<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> Applied Linguistics and Linguistic Technologies, Par<strong>the</strong>nit – Crimea, Ukraine, September2008”, 2009 (in preparati<strong>on</strong>).