5.1. Weaving <strong>the</strong> Full-fledged Structure 1675. Selected groups of new lemmas were loaded into WNW and <strong>the</strong> Algorithm ofActivation-area Attachment [AAA] was run to generate suggestions of attachmentareas.6. Linguists worked freely with <strong>the</strong> lemma groups; <strong>the</strong>y browsed suggestions in anyorder and edited <strong>the</strong> wordnet structure.7. At any moment of <strong>the</strong> process, linguists could re-run AAA to get perhaps bettersuggestions for those new lemmas that have not been edited yet.8. Linguists notified <strong>the</strong> coordinator about finishing work with particular groups;<strong>the</strong> coordinator <strong>the</strong>n could analyse <strong>the</strong> results using <strong>the</strong> same WNW system(accessing it via <strong>the</strong> Internet, just like <strong>the</strong> linguists).The whole process of extracting data sets – sources of evidence for AAA – performedin steps 1-2 took approximately 25 days on a standard PC (3GHz, 4GB RAM,one single-core processor). The time could be reduced to 2-4 days by applying a gridof at least several PCs. This one-time operation is computationally very intensive, butit prepares all data sets except classifiers at <strong>the</strong> beginning of a long-term expansionprocess. This is done once per each list of new lemmas, independent of <strong>the</strong> size of<strong>the</strong> list. Classifier training, to be repeated several times with <strong>the</strong> increasing size of <strong>the</strong>wordnet, it is much less computationally demanding than <strong>the</strong> o<strong>the</strong>r tasks. AAA is performedon <strong>the</strong> server, not on <strong>the</strong> linguists’ PCs. It takes 10-20 minutes on a PC-classserver.Clustering (step 4) is optional <strong>from</strong> <strong>the</strong> point of view of <strong>the</strong> WNW application,which can work efficiently with a list of several thousand new lemmas. Clusteringis necessary for people: a huge flat list is just too difficult to comprehend, and it ispractically impossible to organise around it work lasting several weeks.The idea behind clustering was to divide <strong>the</strong> initial list into lemma groups in sucha way that each group consists of lemmas with senses belonging to one domain commonto all of <strong>the</strong>m (at least <strong>the</strong> intersection of <strong>the</strong> lemma senses should belong to onedomain). There is no perfect clustering algorithm, but manual grouping would be toolabourious to be feasible. We applied an off-<strong>the</strong>-shelf implementation of clusteringalgorithms in <strong>the</strong> Cluto package (Karypis, 2002). The input to <strong>the</strong> clustering algorithmswere values which describe semantic relatedness of lemma pairs acquired <strong>from</strong>MSR GRW F . We experimented with different algorithms. After a manual inspectionof <strong>the</strong> results, we selected graph-based clustering. We did not evaluate <strong>the</strong> qualityof clustering exhaustively: <strong>the</strong> mechanism played only a minor, supporting role. Dueto <strong>the</strong> properties of <strong>the</strong> clustering algorithms, we repeated <strong>the</strong> process several times,each time getting some groups and a large set of ‘outliers’, which was next <strong>the</strong> inputto ano<strong>the</strong>r run. The obtained groups were loaded into WNW – all in all, 92 groupswere constructed.
168 Chapter 5. Polish WordNet Today and Tomorrowakacja ‘black locust (false acacia)’, bez ‘lilac’, bluszcz ‘ivy’, brzoza ‘birch’, buk ‘beech’, busz ‘bush’,bylina ‘perennial’, cedr ‘cedar’, choinka ‘Christmas tree’, chrust ‘dry twigs’, chryzantema ‘chrysantemum’,chwast ‘weed’, cis ‘yew’, cyprys ‘cypress’, darnia [a lemmatisation error; should be darń‘sward’], drzewko ‘(small) tree’, drzewostan ‘forestation’, fiołek ‘violet’, gałązka ‘twig’, gęstwina ‘thicket’,girlanda ‘garland’, głóg ‘hawthorn’, goździk ‘carnation’, hiacynt ‘hyacinth’, irys ‘iris’, jabłoń ‘apple tree’,jawor ‘sycamore maple’, jemioła ‘mistletoe’, jeżyna ‘blackberry’, jodła ‘fir’, kaktus ‘cactus’, klon ‘maple’,koniczyna ‘clover’, konwalia ‘lily of <strong>the</strong> valley’, kora ‘bark’, korzenie ‘roots’, krokus ‘crocus’, kwiatek‘(small) flower’, leszczyna ‘hazel’, lilia ‘lily’, listowie ‘foliage’, łyko ‘phloem’, mech ‘moss’, modrzew‘larch’, narcyz ‘narcissus’, orchidea, oset ‘orchid, thistle’, osika ‘aspen’, palma ‘palm tree’, papirus‘papyrus’, paproć ‘fern’, platan ‘plane tree’, pnącz [a lemmatisation error; should be pnącze ‘creeper’],pnącze ‘creeper’, pokrzywa ‘nettle’, polano ‘log’, rododendron ‘rhododenron’, roślinność ‘vegetation’,sadzonka ‘seedling’, sitowie ‘rush’, słonecznik ‘sunflower’, sosna ‘pine’, stokrotka ‘daisy’, szałwia ‘sage’,szyszka ‘cone’, ściernisko ‘stubble field’, świerk ‘spruce’, topola ‘polar’, trzcina ‘reed’, tulipan ‘tulip’,wiąz ‘elm’, wić ‘runner’, wieniec ‘wreath’, wierzba ‘willow’, winorośl ‘grape vine’, wodorost ‘alga,seaweed’, wrzos ‘hea<strong>the</strong>r’, zarośle ‘thicket’, źdźbło ‘blade (of grass)’, żonkil ‘daffodil’, żywopłot ‘hedge’aktówka ‘briefcase’, atrament ‘ink’, bagaż ‘luggage’, bibuła ‘blotting paper’, bibułka ‘tissue paper’,bloczek ‘notepad’, cerata ‘oilcloth’, chlebak ‘haversack’, cyrkiel ‘compass (for drawing)’, długopis‘ball-point pen’, dzianina ‘hosiery’, filc ‘felt’, grzechotka ‘rattle’, gumka ‘eraser’, hamak ‘hammock’,juk ‘saddle bag’, kabura ‘holster’, karton ‘carton’, klocek ‘(toy) block’, kojec ‘pen (for a child)’,kołyska ‘cradle’, koperta ‘envelope’, kredka ‘crayon’, leżak ‘deck chair’, łóżeczko ‘(small) bed’, markiza‘awning’, mat ‘mate, matte’, mata ‘mat’, muślin ‘muslin’, namiot ‘tent’, nosze ‘stretchers’, nożyczki‘scissors’, ołówek ‘pencil’, otomana ‘sofa’, paczuszka ‘(small) package’, pakunek ‘package’, pergamin‘parchment’, perkal ‘gingham’, pędzel ‘brush’, pierzyna ‘duvet’, plastelina ‘plasticine’, poduszeczka‘(small) pillow’, przybór ‘implement’, saszetka ‘sachet’, segregator ‘binder’, siodełko ‘seat’, skakanka‘skip rope’, skoroszyt ‘folder’, spinacz ‘paper clip’, stalówka ‘nib’, stołek ‘stool’, szala ‘tray (in scales)’,sztaluga ‘easel’, tłumok ‘(large) bundle’, tobół ‘(large) bundle’, tornister ‘knapsack’, tusz ‘ink’, włóczka‘yarn’, woreczek ‘(small) sack’, worek ‘sack’, wór ‘(large) sack’, wyściółka ‘lining, padding’, zawiniątko‘bundle’, zwitek ‘scroll, wad, roll’, zwitka [a lemmatisation error; should be zwitek ‘scroll, wad, roll’]Figure 5.1: Examples of groups of new lemmas created by automatic clusteringIt was very hard to find a pure one-domain group, but most groups seem to fall intoonly two-three domains. Figure 5.1 shows two examples. This had positive influenceon <strong>the</strong> expansion process. Skimming a group usually sufficed to identify its maindomains, so we could direct <strong>the</strong> expansion process first toward <strong>the</strong> missing parts in <strong>the</strong>hypernymy structure. The linguists could concentrate on a few domains and graduallyexpand <strong>the</strong> given hypernymy subgraphs while working with a given group. Afteradding some LUs to <strong>the</strong> given domain, AAA could be rerun to recompute suggestionsfor <strong>the</strong> still unedited lemmas; in narrow domains with deeper hypernymy structure,such as food or clothing, this increased <strong>the</strong> accuracy of suggestions and facilitated <strong>the</strong>linguists’ work. Later on, experienced linguists were able to decide for which group<strong>the</strong> slightly time-consuming recomputation is worth doing.WNW was designed as a plug-in to <strong>the</strong> wordnet editor (Section 2.4). AAAgeneratedsuggestions (step 6) presented as shown in Section 4.5.4 appear in a panel,
- Page 1 and 2:
A Wordnetfrom the Ground Up
- Page 3 and 4:
Work financed by the Polish Ministr
- Page 7 and 8:
6 Prefaceheartfelt thanks go to all
- Page 9:
8 Chapter 1. Motivation, Goals, Ear
- Page 12 and 13:
1.1. Motivation 11[a] special form
- Page 14 and 15:
1.1. Motivation 13Affect (Strappara
- Page 16 and 17:
1.2. The Goals of the plWordNet Pro
- Page 18 and 19:
1.2. The Goals of the plWordNet Pro
- Page 20 and 21:
1.3. Early Decisions 19Merge Model:
- Page 22:
1.3. Early Decisions 214. On the ot
- Page 25 and 26:
24 Chapter 2. Building a Wordnet Co
- Page 27 and 28:
26 Chapter 2. Building a Wordnet Co
- Page 29 and 30:
28 Chapter 2. Building a Wordnet Co
- Page 31 and 32:
30 Chapter 2. Building a Wordnet Co
- Page 33 and 34:
32 Chapter 2. Building a Wordnet Co
- Page 35 and 36:
34 Chapter 2. Building a Wordnet Co
- Page 37 and 38:
36 Chapter 2. Building a Wordnet Co
- Page 39 and 40:
38 Chapter 2. Building a Wordnet Co
- Page 41 and 42:
40 Chapter 2. Building a Wordnet Co
- Page 43 and 44:
42 Chapter 2. Building a Wordnet Co
- Page 45 and 46:
44 Chapter 2. Building a Wordnet Co
- Page 47 and 48:
46 Chapter 2. Building a Wordnet Co
- Page 49 and 50:
48 Chapter 3. Discovering Semantic
- Page 51 and 52:
50 Chapter 3. Discovering Semantic
- Page 53 and 54:
52 Chapter 3. Discovering Semantic
- Page 55 and 56:
54 Chapter 3. Discovering Semantic
- Page 57 and 58:
56 Chapter 3. Discovering Semantic
- Page 59 and 60:
58 Chapter 3. Discovering Semantic
- Page 61 and 62:
60 Chapter 3. Discovering Semantic
- Page 63 and 64:
62 Chapter 3. Discovering Semantic
- Page 65 and 66:
64 Chapter 3. Discovering Semantic
- Page 67 and 68:
66 Chapter 3. Discovering Semantic
- Page 69 and 70:
68 Chapter 3. Discovering Semantic
- Page 71 and 72:
70 Chapter 3. Discovering Semantic
- Page 73 and 74:
72 Chapter 3. Discovering Semantic
- Page 75 and 76:
74 Chapter 3. Discovering Semantic
- Page 77 and 78:
76 Chapter 3. Discovering Semantic
- Page 79 and 80:
78 Chapter 3. Discovering Semantic
- Page 81 and 82:
80 Chapter 3. Discovering Semantic
- Page 83 and 84:
82 Chapter 3. Discovering Semantic
- Page 85 and 86:
84 Chapter 3. Discovering Semantic
- Page 87 and 88:
86 Chapter 3. Discovering Semantic
- Page 89 and 90:
88 Chapter 3. Discovering Semantic
- Page 91 and 92:
90 Chapter 3. Discovering Semantic
- Page 93 and 94:
92 Chapter 3. Discovering Semantic
- Page 95 and 96:
94 Chapter 3. Discovering Semantic
- Page 97 and 98:
96 Chapter 3. Discovering Semantic
- Page 99 and 100:
98 Chapter 3. Discovering Semantic
- Page 101 and 102:
100 Chapter 3. Discovering Semantic
- Page 103 and 104:
102 Chapter 4. Extracting Relation
- Page 105 and 106:
104 Chapter 4. Extracting Relation
- Page 107 and 108:
106 Chapter 4. Extracting Relation
- Page 109 and 110:
108 Chapter 4. Extracting Relation
- Page 111 and 112:
110 Chapter 4. Extracting Relation
- Page 113 and 114:
112 Chapter 4. Extracting Relation
- Page 115 and 116:
114 Chapter 4. Extracting Relation
- Page 117 and 118: 116 Chapter 4. Extracting Relation
- Page 119 and 120: 118 Chapter 4. Extracting Relation
- Page 121 and 122: 120 Chapter 4. Extracting Relation
- Page 123 and 124: 122 Chapter 4. Extracting Relation
- Page 125 and 126: 124 Chapter 4. Extracting Relation
- Page 127 and 128: 126 Chapter 4. Extracting Relation
- Page 129 and 130: 128 Chapter 4. Extracting Relation
- Page 131 and 132: 130 Chapter 4. Extracting Relation
- Page 133 and 134: 132 Chapter 4. Extracting Relation
- Page 135 and 136: 134 Chapter 4. Extracting Relation
- Page 137 and 138: 136 Chapter 4. Extracting Relation
- Page 139 and 140: 138 Chapter 4. Extracting Relation
- Page 141 and 142: 140 Chapter 4. Extracting Relation
- Page 143 and 144: 142 Chapter 4. Extracting Relation
- Page 145 and 146: 144 Chapter 4. Extracting Relation
- Page 147 and 148: 146 Chapter 4. Extracting Relation
- Page 149 and 150: 148 Chapter 4. Extracting Relation
- Page 151 and 152: 150 Chapter 4. Extracting Relation
- Page 153 and 154: 152 Chapter 4. Extracting Relation
- Page 155 and 156: 154 Chapter 4. Extracting Relation
- Page 157 and 158: 156 Chapter 4. Extracting Relation
- Page 159 and 160: 158 Chapter 4. Extracting Relation
- Page 161 and 162: 160 Chapter 4. Extracting Relation
- Page 163 and 164: 162 Chapter 4. Extracting Relation
- Page 165 and 166: 164 Chapter 4. Extracting Relation
- Page 167: 166 Chapter 5. Polish WordNet Today
- Page 171 and 172: 170 Chapter 5. Polish WordNet Today
- Page 173 and 174: 172 Chapter 5. Polish WordNet Today
- Page 175 and 176: 174 Chapter 5. Polish WordNet Today
- Page 177 and 178: 176 Chapter 5. Polish WordNet Today
- Page 179 and 180: 178 Chapter 5. Polish WordNet Today
- Page 181 and 182: 180 Chapter 5. Polish WordNet Today
- Page 183 and 184: 182 Chapter 5. Polish WordNet Today
- Page 186 and 187: Appendix ATests for Lexico-semantic
- Page 188 and 189: 187Test for adjectives (T. IX)1. p1
- Page 190 and 191: 189RelatednessTest for nouns (T. XV
- Page 192 and 193: BibliographyAgarwal, Abhaya and Alo
- Page 194 and 195: Bibliography 193on Deep Lexical Acq
- Page 196 and 197: Bibliography 195Derwojedowa, Magdal
- Page 198 and 199: Bibliography 197Grefenstette, Grego
- Page 200 and 201: Bibliography 199Kurc, Roman. (2008)
- Page 202 and 203: Bibliography 201Mohammad, Saif and
- Page 204 and 205: Bibliography 203. (2006) “The pot
- Page 206 and 207: Bibliography 205and Technology 7(1-
- Page 208 and 209: List of Tables2.1 The size of the c
- Page 210 and 211: List of Figures2.1 The LU perspecti
- Page 212 and 213: List of Figures 2114.16 Completely
- Page 214 and 215: Index 213CBC, see Clustering by Com
- Page 216 and 217: Index 215169, 177, 178, 180, 182hyp
- Page 218 and 219:
Index 217mutual hypernymy, 24Mutual
- Page 220 and 221:
Index 219SUMO, 14Supported Vector M
- Page 222:
A language without a wordnet is at