06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

168 Chapter 5. Polish WordNet Today and Tomorrowakacja ‘black locust (false acacia)’, bez ‘lilac’, bluszcz ‘ivy’, brzoza ‘birch’, buk ‘beech’, busz ‘bush’,bylina ‘perennial’, cedr ‘cedar’, choinka ‘Christmas tree’, chrust ‘dry twigs’, chryzantema ‘chrysantemum’,chwast ‘weed’, cis ‘yew’, cyprys ‘cypress’, darnia [a lemmatisation error; should be darń‘sward’], drzewko ‘(small) tree’, drzewostan ‘forestation’, fiołek ‘violet’, gałązka ‘twig’, gęstwina ‘thicket’,girlanda ‘garland’, głóg ‘hawthorn’, goździk ‘carnation’, hiacynt ‘hyacinth’, irys ‘iris’, jabłoń ‘apple tree’,jawor ‘sycamore maple’, jemioła ‘mistletoe’, jeżyna ‘blackberry’, jodła ‘fir’, kaktus ‘cactus’, klon ‘maple’,koniczyna ‘clover’, konwalia ‘lily of <strong>the</strong> valley’, kora ‘bark’, korzenie ‘roots’, krokus ‘crocus’, kwiatek‘(small) flower’, leszczyna ‘hazel’, lilia ‘lily’, listowie ‘foliage’, łyko ‘phloem’, mech ‘moss’, modrzew‘larch’, narcyz ‘narcissus’, orchidea, oset ‘orchid, thistle’, osika ‘aspen’, palma ‘palm tree’, papirus‘papyrus’, paproć ‘fern’, platan ‘plane tree’, pnącz [a lemmatisation error; should be pnącze ‘creeper’],pnącze ‘creeper’, pokrzywa ‘nettle’, polano ‘log’, rododendron ‘rhododenron’, roślinność ‘vegetation’,sadzonka ‘seedling’, sitowie ‘rush’, słonecznik ‘sunflower’, sosna ‘pine’, stokrotka ‘daisy’, szałwia ‘sage’,szyszka ‘cone’, ściernisko ‘stubble field’, świerk ‘spruce’, topola ‘polar’, trzcina ‘reed’, tulipan ‘tulip’,wiąz ‘elm’, wić ‘runner’, wieniec ‘wreath’, wierzba ‘willow’, winorośl ‘grape vine’, wodorost ‘alga,seaweed’, wrzos ‘hea<strong>the</strong>r’, zarośle ‘thicket’, źdźbło ‘blade (of grass)’, żonkil ‘daffodil’, żywopłot ‘hedge’aktówka ‘briefcase’, atrament ‘ink’, bagaż ‘luggage’, bibuła ‘blotting paper’, bibułka ‘tissue paper’,bloczek ‘notepad’, cerata ‘oilcloth’, chlebak ‘haversack’, cyrkiel ‘compass (for drawing)’, długopis‘ball-point pen’, dzianina ‘hosiery’, filc ‘felt’, grzechotka ‘rattle’, gumka ‘eraser’, hamak ‘hammock’,juk ‘saddle bag’, kabura ‘holster’, karton ‘carton’, klocek ‘(toy) block’, kojec ‘pen (for a child)’,kołyska ‘cradle’, koperta ‘envelope’, kredka ‘crayon’, leżak ‘deck chair’, łóżeczko ‘(small) bed’, markiza‘awning’, mat ‘mate, matte’, mata ‘mat’, muślin ‘muslin’, namiot ‘tent’, nosze ‘stretchers’, nożyczki‘scissors’, ołówek ‘pencil’, otomana ‘sofa’, paczuszka ‘(small) package’, pakunek ‘package’, pergamin‘parchment’, perkal ‘gingham’, pędzel ‘brush’, pierzyna ‘duvet’, plastelina ‘plasticine’, poduszeczka‘(small) pillow’, przybór ‘implement’, saszetka ‘sachet’, segregator ‘binder’, siodełko ‘seat’, skakanka‘skip rope’, skoroszyt ‘folder’, spinacz ‘paper clip’, stalówka ‘nib’, stołek ‘stool’, szala ‘tray (in scales)’,sztaluga ‘easel’, tłumok ‘(large) bundle’, tobół ‘(large) bundle’, tornister ‘knapsack’, tusz ‘ink’, włóczka‘yarn’, woreczek ‘(small) sack’, worek ‘sack’, wór ‘(large) sack’, wyściółka ‘lining, padding’, zawiniątko‘bundle’, zwitek ‘scroll, wad, roll’, zwitka [a lemmatisation error; should be zwitek ‘scroll, wad, roll’]Figure 5.1: Examples of groups of new lemmas created by automatic clusteringIt was very hard to find a pure one-domain group, but most groups seem to fall intoonly two-three domains. Figure 5.1 shows two examples. This had positive influenceon <strong>the</strong> expansion process. Skimming a group usually sufficed to identify its maindomains, so we could direct <strong>the</strong> expansion process first toward <strong>the</strong> missing parts in <strong>the</strong>hypernymy structure. The linguists could concentrate on a few domains and graduallyexpand <strong>the</strong> given hypernymy subgraphs while working with a given group. Afteradding some LUs to <strong>the</strong> given domain, AAA could be rerun to recompute suggestionsfor <strong>the</strong> still unedited lemmas; in narrow domains with deeper hypernymy structure,such as food or clothing, this increased <strong>the</strong> accuracy of suggestions and facilitated <strong>the</strong>linguists’ work. Later on, experienced linguists were able to decide for which group<strong>the</strong> slightly time-consuming recomputation is worth doing.WNW was designed as a plug-in to <strong>the</strong> wordnet editor (Section 2.4). AAAgeneratedsuggestions (step 6) presented as shown in Section 4.5.4 appear in a panel,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!