The current state of work on the Polish-Ukrainian Parallel Corpus
The current state of work on the Polish-Ukrainian Parallel Corpus
The current state of work on the Polish-Ukrainian Parallel Corpus
Transform your PDFs into Flipbooks and boost your revenue!
Leverage SEO-optimized Flipbooks, powerful backlinks, and multimedia content to professionally showcase your products and significantly increase your reach.
Широков В.А, О.В.Бугаков, Т.О.Грязнухіна, О.М.Костишин, М.Ю.Кригін, Т.П.Любченко,О.Г.Рабулець, О.О.Сидоренко, Н.М.Сидорчук, І.В.Шевченко, О.О.Шипнівська, К.М.Якименко.Корпусна лінгвістика. Київ: Довіра, 2005.Abstract<str<strong>on</strong>g>The</str<strong>on</strong>g> article describes <strong>the</strong> present <str<strong>on</strong>g>state</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>work</str<strong>on</strong>g> <strong>on</strong> PolUKR, <strong>the</strong> <strong>Polish</strong>-<strong>Ukrainian</strong> parallel corpus,developed in <strong>the</strong> Institute <str<strong>on</strong>g>of</str<strong>on</strong>g> Slavic Studies <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>the</strong> <strong>Polish</strong> Academy <str<strong>on</strong>g>of</str<strong>on</strong>g> Sciences since 2004. Presentedare <strong>the</strong> ways <str<strong>on</strong>g>of</str<strong>on</strong>g> bitexts’ acquisiti<strong>on</strong>, <strong>the</strong>ir structure and pre-processing stages; <strong>the</strong> soluti<strong>on</strong>sc<strong>on</strong>cerning <strong>the</strong> comm<strong>on</strong> morphosyntactic annotati<strong>on</strong> pattern for <strong>Polish</strong> and <strong>Ukrainian</strong>, as well asannotati<strong>on</strong> methods; <strong>the</strong> alignment format and <strong>the</strong> s<str<strong>on</strong>g>of</str<strong>on</strong>g>tware used or developed for <strong>the</strong> corpusneeds.Recommendati<strong>on</strong>sOne <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>the</strong> objectives <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>the</strong> <str<strong>on</strong>g>current</str<strong>on</strong>g> project is to develop a scheme for creating a parallel corpus forany pair <str<strong>on</strong>g>of</str<strong>on</strong>g> Slavic languages. At <strong>the</strong> moment a researcher who deals with Slavic parallel corporaenvisages several major problems that need to be attended to. One <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>the</strong> still unresolved issues is acomm<strong>on</strong> morphological annotati<strong>on</strong> tagset for Slavic languages that should ensure uniform searchthrough both parts <str<strong>on</strong>g>of</str<strong>on</strong>g> a corpus at <strong>the</strong> same time. Technical bilingual dicti<strong>on</strong>aries for sentencealignment as well as a user friendly alignment editor are necessary to enable c<strong>on</strong>trollable high-qualityalignment. A free, platform independent search engine for parallel corpora is also needed.