A Wordnet from the Ground Up

More documents

Recommendations

Info

4.4. Benefits of Extracted Patterns 123Hum. eval. Ranking Prec. plWN Rel. R Inst.[%] 70% 60% 50% [%] Inst.ESP- 39 8 22 43 36 501 1.0 3982ESP-nm 47 5 14 62 37 561 1.54 6435ESPmorf- 45 13 18 71 39 361 0.75 2600ESPfree- 43 9 12 23 29 567 1.36 4621EST- 54 10 27 – 30 651 1.71 4917EST-nm 59 42 90 – 35 571 1.7 4457EST+nm 37 18 32 52 27 1312 2.38 10000Table 4.2: The influence of the extended reliability measure and changes in the pattern form (“Hum.eval.” – precision based on human judgement, “Ranking” – the number of the top instancesabove the precision threshold, “Prec. plWN” – precision in relation to plWordNet, “Rel. R”– relative recall relative to ESP-)manual evaluation. It means that EST-nm starting from the same seeds acquired fromplWordNet goes beyond the source and extracts many instances which are not describedin plWordNet. This is a very promising feature concerning the potential application inexpanding plWordNet.We also observed that the value of the original reliability measure (4.1) decreasesvery fast. After the sixth iteration it goes far below 10-12. This explains the dropof the number of newly extracted instances. Applying the modified reliability formula(4.6) circumvents the problem.Another matter of concern is the scheme of the patterns adjusted for Polish. It isclear that the application of the adjusted patterns produces better precision EST- andEST-nm in comparison to ESP- and ESP-nm. In the case of EST+nm, utilising thegeneric patterns, the precision is lower, but its relative recall shows its potential inextracting new instances. At the cost of reduced precision, the number of extractedinstances increases by the factor 2.38 (the total number of the extracted instancesdepends on the number of instances above the threshold).The second group of experiments was performed only for Estratto using genericpatterns and the extended reliability measure, i.e. for EST+nm. The aim was todetermine the influence of the algorithm parameters on the result. The followingdependencies were investigated:1. the influence of the confidence threshold on the precision of instances achievedwithin subsequent iterations,2. the influence of the number of seeds on the induced patterns, and then theinfluence of the relation between instances and patterns induced by them,3. the influence of the number of the top k patterns selected for the next iterationon the stability of the algorithm and the precision of instances,
124 Chapter 4. Extracting Relation Instances4. the dependency on the filtering infrequent and very frequent patterns and instances.5. the way in which various statistical similarity measures used in reliability calculationchange the precision of the results.Human Eval. [%] Relative Recall InstancesEST+nm:th1.0 12 0.79 24552EST+nm:th2.6 37 1.00 10000EST+nm:th5.2 48 0.54 4170EST+nm:5seeds 22 0.71 11882EST+nm:10seeds 25 0.84 12476EST+nm:15seeds 24 0.85 13189EST+nm:5insts/1patt 24 0.83 12773EST+nm:10insts/1patt 29 1.03 13188EST+nm:40insts/1patt 37 1.00 10000EST+nm:k4 37 1.00 10000EST+nm:k8 41 2.80 25361EST+nm:k12 38 2.70 26501Table 4.3: The dependence of the algorithms on the parameter values (Kurc, 2008)In case 1 it seems that the highest threshold gives the best results – see Table 4.3and the first three rows, but a too high threshold decreases the total number of theextracted proper instances, as the relative recall is significantly decreased. There must,however, be a balanced ratio between instances selected for the next iteration andnew patterns induced. With few instances, there is no statistical evidence to induceproper patterns, and EST/ESP crawls picking almost random patterns. That leads tothe decrease in precision.Initial seeds, case 2 (marked ‘nnseeds’ in Table 4.3, where nn is the number) aremeant to generate a skeleton of a model of the lexico-semantic relation. If the numberof seeds is not high enough, the best extracted patterns can be random. Of course, onecould collect a small number of seeds that would indicate only expected patterns, butthat would require a precise analysis of the corpus used for instance extraction. Thatis pointless, because by using more seeds one can acquire the same patterns with lesseffort.The influence of the number of instances preserved between two subsequent iterationsis similar to the influence of the number of seeds, see the rows marked‘nninsts/1patt’ in Table 4.3 – nn preserved instances for one pattern. More instanceskept, and next used for the evaluation of the patterns, give better description of thewhole model. According to the experiments, at least 15 seeds and 10 instances for
Page 1 and 2:
A Wordnetfrom the Ground Up
Page 3 and 4:
Work financed by the Polish Ministr
Page 7 and 8:
6 Prefaceheartfelt thanks go to all
Page 9:
8 Chapter 1. Motivation, Goals, Ear
Page 12 and 13:
1.1. Motivation 11[a] special form
Page 14 and 15:
1.1. Motivation 13Affect (Strappara
Page 16 and 17:
1.2. The Goals of the plWordNet Pro
Page 18 and 19:
1.2. The Goals of the plWordNet Pro
Page 20 and 21:
1.3. Early Decisions 19Merge Model:
Page 22:
1.3. Early Decisions 214. On the ot
Page 25 and 26:
24 Chapter 2. Building a Wordnet Co
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
48 Chapter 3. Discovering Semantic
Page 51 and 52:
Page 53 and 54:
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
Page 61 and 62:
Page 63 and 64:
Page 65 and 66:
Page 67 and 68:
Page 69 and 70:
Page 71 and 72:
Page 73 and 74: 72 Chapter 3. Discovering Semantic
Page 103 and 104: 102 Chapter 4. Extracting Relation
Page 123: 122 Chapter 4. Extracting Relation
Page 167 and 168: 166 Chapter 5. Polish WordNet Today
Page 175 and 176:
174 Chapter 5. Polish WordNet Today
Page 177 and 178:
Page 179 and 180:
Page 181 and 182:
Page 183 and 184:
Page 186 and 187:
Appendix ATests for Lexico-semantic
Page 188 and 189:
187Test for adjectives (T. IX)1. p1
Page 190 and 191:
189RelatednessTest for nouns (T. XV
Page 192 and 193:
BibliographyAgarwal, Abhaya and Alo
Page 194 and 195:
Bibliography 193on Deep Lexical Acq
Page 196 and 197:
Bibliography 195Derwojedowa, Magdal
Page 198 and 199:
Bibliography 197Grefenstette, Grego
Page 200 and 201:
Bibliography 199Kurc, Roman. (2008)
Page 202 and 203:
Bibliography 201Mohammad, Saif and
Page 204 and 205:
Bibliography 203. (2006) “The pot
Page 206 and 207:
Bibliography 205and Technology 7(1-
Page 208 and 209:
List of Tables2.1 The size of the c
Page 210 and 211:
List of Figures2.1 The LU perspecti
Page 212 and 213:
List of Figures 2114.16 Completely
Page 214 and 215:
Index 213CBC, see Clustering by Com
Page 216 and 217:
Index 215169, 177, 178, 180, 182hyp
Page 218 and 219:
Index 217mutual hypernymy, 24Mutual
Page 220 and 221:
Index 219SUMO, 14Supported Vector M
Page 222:
A language without a wordnet is at
show all

A Wordnet from the Ground Up

Create successful ePaper yourself

Delete template?

Save as template?