A Wordnet from the Ground Up

More documents

Recommendations

Info

3.3. Evaluation 59the number of LUs from the subtree of Q which are located at a close distance to Q.The problem is especially visible for question LUs in small hypernymy subtrees witha limited number of hyponyms. The problem appears in the case of any similaritymeasure based on the path length, so we have heuristically modified the measure byadding a constant δ R = 3 to any path going across the virtual root. Lower values ofδ R gave no visible changes, while the higher numbers caused a large reduction of thenumber of QA pairs.The difference in the level of difficulty between WBST+H and EWBST is illustratedin Figure 3.3 by an example problem generated by this method for the same QA pair:〈majątek (property, estate), mienie (property) 〉.EWBSTQ: majątek (property, estete)A: lokata (deposit, investment), mienie (property)obligacja (bond, stock), wkład (deposit, outlay)WBST+HQ: majątek (property, estete)A: dzieciuch (child, brat), mienie (property)rynsztok (gutter),stryj (uncle, father’s brother).Figure 3.3: Example of the difference between EWBST and WBST QA pairsSimilarly to the tests performed for WBST+H, we have assessed the influence ofthe evolution of plWordNet on the MSR performance in EWBST. The same algorithmof extraction was used as in the case of the former experiments: MSR GRW F (Lin)discussed in Section 3.4. Only a MSR for nominal LUs was built, because EWBSTdepend strongly on the hypernymy structure. The same MSR was tested with differentversions of EWBST produced from different archival versions of plWordNet. Theresults are presented in the joint Table 3.2. Examples of EWBST test instances arepresented in the Fig. 3.4.We can observe for EWBST results a similar tendency as for WBST+H. TheEWBST test becomes slightly easier as plWordNet evolves (we hope that it improves):from 64.81% to 69.75%. For EWBST, however, the increase is continuous with eachversion of plWordNet – WBST+H shows a larger difference only between the finalcore plWordNet and the expanded version. The increase for EWBST may be due tothe deepening of the hypernymy structure. There are two possible reasons for theobserved changes of the MSR results in relation to different tests. The introductionof many specific LUs in the expanded version of plWordNet made both tests easier:specific LUs are easier to distinguish. EWBST was getting easier with the deepeninghypernymic structure, as LUs grouped earlier in large vague synsets were distributed
60 Chapter 3. Discovering Semantic RelatednessEWBST, Nouns, plWordNet 12.2006Q: aromat (aroma)A: bukiet (bouquet), fetor (stench),smrodek (stink (diminutive)), smród (stink)EWBST, Nouns, plWordNet 09.2007Q: aromat (aroma)A: bukiet (bouquet), fetor (stench),powódź (reason),upał (heat)EWBST, Nouns, plWordNet 1.0Q: aromat (aroma)A: bukiet (bouquet), piorun (thunderbolt),widmo (phantom),zadymka (snowstorm)WBST+H, Nouns, plWordNet 1.0Q: aromat (aroma)A: bukiet (bouquet), faworyzowanie (favouring),harówka (drudgery), matematyka (mathematics)Figure 3.4: Examples of QA pairs with detractors generated from different versions of plWordNet for thesame QA pairalong the structure and less frequently drawn as detractors. The QA pairs generatedfrom broad synsets were often vaguely semantically related and were harder for bothtests to differentiate from the question-detractor pairs, which were often also vaguelyrelated.We also tested raters’ performance on EWBST for the needs of future comparisonswith the performance of the automatically extracted MSRs. During the first experiment,an example EWBST test generated from the March 2007 plWordNet was given to 32native speakers of Polish, all of them Computer Science students 5 . The test consistedof 99 QA pairs. All LUs in the test were selected from 5706 single-word noun LUs inplWordNet. In the set of question LUs, 42 LUs occurred more 1000 times in the IPIPAN corpus (Przepiórkowski, 2004). This subset was distinguished in the test, becausesuch LUs are also the basis of the comparison with the results achieved in (Freitaget al., 2005).For all QA pairs the result was 70%, with the 61.62% minimum, 78.79% maximumand σ = 4.07% standard deviation from the mean. For the subset consisting of frequentLUs, the average result was 63.24%, with the minimum 52.38%, maximum 73.81%and σ = 5.37%.5 As in experiments with WBST+H, this bias in the background should not influence the results,because the test was composed from plWordNet which at present includes only general Polish vocabulary.
Page 1 and 2:
A Wordnetfrom the Ground Up
Page 3 and 4:
Work financed by the Polish Ministr
Page 7 and 8:
6 Prefaceheartfelt thanks go to all
Page 9: 8 Chapter 1. Motivation, Goals, Ear
Page 12 and 13: 1.1. Motivation 11[a] special form
Page 14 and 15: 1.1. Motivation 13Affect (Strappara
Page 16 and 17: 1.2. The Goals of the plWordNet Pro
Page 18 and 19: 1.2. The Goals of the plWordNet Pro
Page 20 and 21: 1.3. Early Decisions 19Merge Model:
Page 22: 1.3. Early Decisions 214. On the ot
Page 25 and 26: 24 Chapter 2. Building a Wordnet Co
Page 49 and 50: 48 Chapter 3. Discovering Semantic
Page 59: 58 Chapter 3. Discovering Semantic
Page 103 and 104: 102 Chapter 4. Extracting Relation
Page 111 and 112:
110 Chapter 4. Extracting Relation
Page 113 and 114:
Page 115 and 116:
Page 117 and 118:
Page 119 and 120:
Page 121 and 122:
Page 123 and 124:
Page 125 and 126:
Page 127 and 128:
Page 129 and 130:
Page 131 and 132:
Page 133 and 134:
Page 135 and 136:
Page 137 and 138:
Page 139 and 140:
Page 141 and 142:
Page 143 and 144:
Page 145 and 146:
Page 147 and 148:
Page 149 and 150:
Page 151 and 152:
Page 153 and 154:
Page 155 and 156:
Page 157 and 158:
Page 159 and 160:
Page 161 and 162:
Page 163 and 164:
Page 165 and 166:
Page 167 and 168:
166 Chapter 5. Polish WordNet Today
Page 169 and 170:
Page 171 and 172:
Page 173 and 174:
Page 175 and 176:
Page 177 and 178:
Page 179 and 180:
Page 181 and 182:
Page 183 and 184:
Page 186 and 187:
Appendix ATests for Lexico-semantic
Page 188 and 189:
187Test for adjectives (T. IX)1. p1
Page 190 and 191:
189RelatednessTest for nouns (T. XV
Page 192 and 193:
BibliographyAgarwal, Abhaya and Alo
Page 194 and 195:
Bibliography 193on Deep Lexical Acq
Page 196 and 197:
Bibliography 195Derwojedowa, Magdal
Page 198 and 199:
Bibliography 197Grefenstette, Grego
Page 200 and 201:
Bibliography 199Kurc, Roman. (2008)
Page 202 and 203:
Bibliography 201Mohammad, Saif and
Page 204 and 205:
Bibliography 203. (2006) “The pot
Page 206 and 207:
Bibliography 205and Technology 7(1-
Page 208 and 209:
List of Tables2.1 The size of the c
Page 210 and 211:
List of Figures2.1 The LU perspecti
Page 212 and 213:
List of Figures 2114.16 Completely
Page 214 and 215:
Index 213CBC, see Clustering by Com
Page 216 and 217:
Index 215169, 177, 178, 180, 182hyp
Page 218 and 219:
Index 217mutual hypernymy, 24Mutual
Page 220 and 221:
Index 219SUMO, 14Supported Vector M
Page 222:
A language without a wordnet is at
show all

A Wordnet from the Ground Up

Create successful ePaper yourself

Delete template?

Save as template?