A Wordnet from the Ground Up

More documents

Recommendations

Info

3.3. Evaluation 57PoS Min [%] Avg [%] Max [%] KappaVerb 84 88.21 95 0.84Adjective 82 88.9 95 0.85Table 3.3: Results of a manual WBST for Polish verbs and adjectives – the evaluation performed for(Broda et al., 2008) (May 2007)R Min [%] Max [%] Avg [%]Noun 29 73.84 96.24 86.64Verb 50 57.54 90.04 81.84Adjective 43 76.24 96.24 89.94Table 3.4: Results of human raters in WBST+H tests generated from the final version of the core plWord-Net (R — a number of raters for the given test)It is misleading to compare the results in Table 3.4 with the almost 100% inWBST+H generated from the May 2007 plWordNet. The increase from 89.29% forJune 2006 plWordNet to nearly 100% for May 2007 plWordNet was caused by theremoval of many obvious errors in broad synsets of the early version of plWordNet. Inmany QA pairs of the former test, raters were misled by strange QA pairs occurring inthe test. So, we can assume the level of almost 100% as the starting point. Consideringthis, when people solve the tests, we can observe a relation between the wordnet usedand the difficulty of the WBST+H test opposite to what happens when MSR is applied:the results are slightly higher for new versions of WBST+H, see Table 3.2. The testresults (produced for the same MSR) stayed approximately at the same level for thesubsequent versions of the core plWordNet, and increased with the present version ofplWordNet expanded semi-automatically with several thousand LUs (Section 4.5.4).3.3.2 Enhanced WBSTIn the WBST defined by Freitag et al. (2005) the elements of the answer set A notsynonymous with Q are chosen at random from the whole wordnet. Thus, the differencein meaning between Q and the detractors is usually obvious to test-takers 3 . It also tendsto be relatively easy for a good MSR, e.g. (Piasecki et al., 2007b). Our overall goal,however, was to construct an MSR that expresses clear preference for the wordnetrelations (focused on semantic similarity in the sense of Mohammad and Hirst (2006)— Section 3.4.2). Such MSR could be used to automatically extract synsets, i.e. to3 The latest versions of the expanded plWordNet introduced more fine-grained distinctions betweenlemma senses. This made WBST+H more difficult for humans, as shown in Table 3.4 in relation to theprevious test results discussed.
58 Chapter 3. Discovering Semantic Relatednessdifferentiate the LUs in a synset from all other LUs similar but not synonymous, amongthem co-hyponyms. Any such MSR must therefore distinguish closely related LUs, notonly those with very different meaning.In modifying the WBST+H test we assumed that we needed to construct the answerset A so that non-synonyms are closer in meaning to the correct answer a i than it isthe case in WBST+H. Obviously, they cannot be synonyms of either a i or Q, but theyought to be related to both. We need to select the non-synonyms among LUs similarto s and to Q. In order to achieve this, we have decided to leverage the structure ofthe wordnet in the determination of similarity and to construct a semantic similarityfunction SSF W N based on the plWordNet hypernymy structure:SSF W N : S × L → R (3.2)where S is a set of synsets, L — lexical units, R — real numbers.SSF W N takes a synset S (e.g. including Q and a i ) and a lexical unit x (e.g.a detractor), and returns the semantic similarity value.During the generation of the modified Enhanced WBST [EWBST], non-synonymsare still selected at random but only from the set of LUs broadly similar to Q and a i .The acceptable values of SSF W N (S Q , x) are lower than some threshold sim t if thesynset S Q contains Q and a i , and x is a detractor. We tested several wordnet-basedsimilarity functions (Agirre and Edmonds, 2006), here implemented using plWordNet’shypernymy structure, and achieved the best result in a generated test with the followingfunction:SSF W N = p min(3.3)2dp min is the length of a minimal path between two LUs in plWordNet, and d is a maximaldepth of the hypernymy hierarchy in the current version of plWordNet. The similaritythreshold sim t = 2 for this function has been established experimentally. To achieveconsistency between tests generated from different versions of plWordNet, we decidedto set the sim t to value corresponding to four arcs in hypernymy hierarchy.The hypernymy structure of nouns in plWordNet does not have a single root,because in plWordNet we have not introduced any artificial common root nodes for allnominal LUs 4 Many methods of similarity computation require a root, however, so wehave introduced a virtual one for the sake of the similarity computation, and linked toit all trees in the hypernymy forest.We noticed that the random selection of LU detractors based any similarity measuretends to favour LUs in the hypernymy subtrees other than Q, if Q is located near theroot. The number of LUs linked by a short path across the root is much higher than4 The same is the case for verbal and adjectival LUs, whose hypernymy structures are also partial andquite shallow.
Page 1 and 2:
A Wordnetfrom the Ground Up
Page 3 and 4:
Work financed by the Polish Ministr
Page 7 and 8: 6 Prefaceheartfelt thanks go to all
Page 9: 8 Chapter 1. Motivation, Goals, Ear
Page 12 and 13: 1.1. Motivation 11[a] special form
Page 14 and 15: 1.1. Motivation 13Affect (Strappara
Page 16 and 17: 1.2. The Goals of the plWordNet Pro
Page 18 and 19: 1.2. The Goals of the plWordNet Pro
Page 20 and 21: 1.3. Early Decisions 19Merge Model:
Page 22: 1.3. Early Decisions 214. On the ot
Page 25 and 26: 24 Chapter 2. Building a Wordnet Co
Page 49 and 50: 48 Chapter 3. Discovering Semantic
Page 57: 56 Chapter 3. Discovering Semantic
Page 103 and 104: 102 Chapter 4. Extracting Relation
Page 109 and 110:
108 Chapter 4. Extracting Relation
Page 111 and 112:
Page 113 and 114:
Page 115 and 116:
Page 117 and 118:
Page 119 and 120:
Page 121 and 122:
Page 123 and 124:
Page 125 and 126:
Page 127 and 128:
Page 129 and 130:
Page 131 and 132:
Page 133 and 134:
Page 135 and 136:
Page 137 and 138:
Page 139 and 140:
Page 141 and 142:
Page 143 and 144:
Page 145 and 146:
Page 147 and 148:
Page 149 and 150:
Page 151 and 152:
Page 153 and 154:
Page 155 and 156:
Page 157 and 158:
Page 159 and 160:
Page 161 and 162:
Page 163 and 164:
Page 165 and 166:
Page 167 and 168:
166 Chapter 5. Polish WordNet Today
Page 169 and 170:
Page 171 and 172:
Page 173 and 174:
Page 175 and 176:
Page 177 and 178:
Page 179 and 180:
Page 181 and 182:
Page 183 and 184:
Page 186 and 187:
Appendix ATests for Lexico-semantic
Page 188 and 189:
187Test for adjectives (T. IX)1. p1
Page 190 and 191:
189RelatednessTest for nouns (T. XV
Page 192 and 193:
BibliographyAgarwal, Abhaya and Alo
Page 194 and 195:
Bibliography 193on Deep Lexical Acq
Page 196 and 197:
Bibliography 195Derwojedowa, Magdal
Page 198 and 199:
Bibliography 197Grefenstette, Grego
Page 200 and 201:
Bibliography 199Kurc, Roman. (2008)
Page 202 and 203:
Bibliography 201Mohammad, Saif and
Page 204 and 205:
Bibliography 203. (2006) “The pot
Page 206 and 207:
Bibliography 205and Technology 7(1-
Page 208 and 209:
List of Tables2.1 The size of the c
Page 210 and 211:
List of Figures2.1 The LU perspecti
Page 212 and 213:
List of Figures 2114.16 Completely
Page 214 and 215:
Index 213CBC, see Clustering by Com
Page 216 and 217:
Index 215169, 177, 178, 180, 182hyp
Page 218 and 219:
Index 217mutual hypernymy, 24Mutual
Page 220 and 221:
Index 219SUMO, 14Supported Vector M
Page 222:
A language without a wordnet is at
show all

A Wordnet from the Ground Up

Create successful ePaper yourself

Delete template?

Save as template?