The current state of work on the Polish-Ukrainian Parallel Corpus
The current state of work on the Polish-Ukrainian Parallel Corpus
The current state of work on the Polish-Ukrainian Parallel Corpus
Transform your PDFs into Flipbooks and boost your revenue!
Leverage SEO-optimized Flipbooks, powerful backlinks, and multimedia content to professionally showcase your products and significantly increase your reach.
iałoruszczyzna @ все, що білоруськеbibułka @ папіросний папірbibułka @ цигарковий папірbibułomania @ манія збирати старі рукописиbiczować @ батожитиbiczowanie @ батоженняbiczyk @ батіжокbiczykowaty @ подібний до батіжкаbić @ битиbiję @ б'юbiec @ бігтиSince both <strong>Polish</strong> and <strong>Ukrainian</strong> are highly inflected languages, basic dicti<strong>on</strong>ary forms are notenough. Ei<strong>the</strong>r we need lemmatized texts, or a dicti<strong>on</strong>ary with all possible forms generated. <str<strong>on</strong>g>The</str<strong>on</strong>g> firstopti<strong>on</strong> seems to be easier to realize, but for this we need to adjust <strong>the</strong> alignment algorithm and to<str<strong>on</strong>g>work</str<strong>on</strong>g> with already annotated texts.Ano<strong>the</strong>r opti<strong>on</strong> for aligning is <strong>the</strong> TextAlign, a user friendly s<str<strong>on</strong>g>of</str<strong>on</strong>g>tware with GUI and editingpossibilities. <str<strong>on</strong>g>The</str<strong>on</strong>g> <strong>on</strong>ly possible input format <strong>the</strong>re is RTF (rich text format), <strong>the</strong> output is a TMX filewith an intertwined parallel text. <str<strong>on</strong>g>The</str<strong>on</strong>g> main problem with <strong>the</strong> unequal number <str<strong>on</strong>g>of</str<strong>on</strong>g> sentences in paralleltexts that effected <strong>the</strong> quality <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>the</strong> results produced by <strong>the</strong> fully automatic and hardly c<strong>on</strong>trollableHunalign is compensated by <strong>the</strong> possibility <str<strong>on</strong>g>of</str<strong>on</strong>g> an easy and quick alignment editi<strong>on</strong> in <strong>the</strong> TextAlign.However, <strong>the</strong> sentence segmentati<strong>on</strong> algorithm in <strong>the</strong> TextAlign is too simple for satisfactory results.Example <str<strong>on</strong>g>of</str<strong>on</strong>g> alignment results by TextAlign, pre-editing phase