A Wordnet from the Ground Up

More documents

Recommendations

Info

4.4. Benefits of Extracted Patterns 127occ=88 rel=0.0060688(hypo:subst:acc) interp który być (hyper:subst:inst)(hypo:subst:acc) interp which is (hyper:subst:inst)The plWordNet-related precision of the Espresso/Estratto algorithm is lower whenmeasured on Polish corpora than the precision reported by Pantel and Pennacchiotti(2006). This might be due to a slightly different approach to precision evaluation,which was performed partially on the basis of the much smaller plWordNet. On theother hand, the results of the manual evaluation are similar to the results reported in(Pantel and Pennacchiotti, 2006). The results for different similarity measures basedon reliability suggest that PMI gives the best results for the given test suite.The adjustment of the pattern scheme to the characteristic features of Polish improvedthe precision over Espresso patterns using only word forms and parts of speechas features.Estratto, the proposed modification of Espresso, succeeded in extracting hypernymyand antonymy from IPIC and the Rzeczpospolita corpus. Attempts to extract meronymywere unsuccessful. Meronymic pairs are present on the MSR-produced list of LUs themost semantically related to the given one, but with failure of the pattern-based attemptswe do not have an additional source of knowledge to separate meronymic pairs fromthose lists.We tested several parameters that have a significant influence on the Estratto algorithm.The most important of them appeared to be:• the number of seed instances,• the confidence threshold,• the number of the top k patterns preserved between the subsequent iterations.The number of seed instances should exceed 10. The confidence threshold stronglydepends on the corpus; for example, for IPIC the best value found was about 2.6. Eachtime the algorithm is applied to a new corpus, both seed instances and the measure ofconfidence must be redefined. The number of the top k patterns should be low around8. Such a number results in a more stable representation of the semantic relation. Itis still unclear how to explore patterns that seem to be correct and are close to thetop. Those patterns usually disappear in the next iterations, so some instances are alsoexcluded from final results.Espresso/Estratto is an intrinsically weakly supervised algorithm. That is true eventhough the preparation of an appropriate set of seeds leading to the extraction of patternsproducing large and diverge set of extracted instances might require even some initial
128 Chapter 4. Extracting Relation InstancesSeed instancessenator (senator)mówca (speaker)nazwa (name)oznaczenie (designation)Polska (Poland)kraj (country)Polska (Poland)państwo (state)wynagrodzenie (remuneration) świadczenie (≈benefit)agencja (agency)jednostka (unit)akademia (academy)uczelnia (university)alkohol (alcohol)substancja (substance)pożar (fire (conflagration)) zdarzenie (event)należność (charge)zobowiązanie (obligation)protokół (protocol)dokument (document)dolar (dollar)waluta (currency)broń (weapon)przedmiot (object)uposażenie (salary)świadczenie (benefit)obligacja (bond)papier (share)zapis (record)dowód (evidence)człowiek (human)podmiot (subject)żywica (resin)spoiwo (adhesive)Extracted instancesszkoła (school)instytucja (institution)maszyna (machine)urządzenie (device)wychowawca (tutor)pracownik (employee)kombatant (veteran)osoba (person)bank (bank)instytucja (institution)pociąg (train)pojazd (vehicle)telewizja (television)medium (medium)prasa (press)medium (medium)szpital (hospital)placówka (institution)czynsz (rent)opłata (payment)grunt (land)nieruchomość (real estate)Wisła (Wisła)rzeka (river)świadectwo (diploma) dokument (document)opłata (payment)należność (charge)ryba (fish)zwierzke (animal)Włochy (Italy)kraj (country)jezioro (lake)zbiornik (reservoir)jarmark (fair)impreza (event)piwo (beer)artykuł (product)zasiłek (dole)świadczenie (benefit)powódź (flood)klęska (disaster)paszport (passport)dokument (document)Figure 4.5: Examples of hypernymy instances extracted by Estratto, version EST+nm
Page 1 and 2:
A Wordnetfrom the Ground Up
Page 3 and 4:
Work financed by the Polish Ministr
Page 7 and 8:
6 Prefaceheartfelt thanks go to all
Page 9:
8 Chapter 1. Motivation, Goals, Ear
Page 12 and 13:
1.1. Motivation 11[a] special form
Page 14 and 15:
1.1. Motivation 13Affect (Strappara
Page 16 and 17:
1.2. The Goals of the plWordNet Pro
Page 18 and 19:
1.2. The Goals of the plWordNet Pro
Page 20 and 21:
1.3. Early Decisions 19Merge Model:
Page 22:
1.3. Early Decisions 214. On the ot
Page 25 and 26:
24 Chapter 2. Building a Wordnet Co
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
48 Chapter 3. Discovering Semantic
Page 51 and 52:
Page 53 and 54:
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
Page 61 and 62:
Page 63 and 64:
Page 65 and 66:
Page 67 and 68:
Page 69 and 70:
Page 71 and 72:
Page 73 and 74:
Page 75 and 76:
Page 77 and 78: 76 Chapter 3. Discovering Semantic
Page 103 and 104: 102 Chapter 4. Extracting Relation
Page 127: 126 Chapter 4. Extracting Relation
Page 167 and 168: 166 Chapter 5. Polish WordNet Today
Page 179 and 180:
178 Chapter 5. Polish WordNet Today
Page 181 and 182:
Page 183 and 184:
Page 186 and 187:
Appendix ATests for Lexico-semantic
Page 188 and 189:
187Test for adjectives (T. IX)1. p1
Page 190 and 191:
189RelatednessTest for nouns (T. XV
Page 192 and 193:
BibliographyAgarwal, Abhaya and Alo
Page 194 and 195:
Bibliography 193on Deep Lexical Acq
Page 196 and 197:
Bibliography 195Derwojedowa, Magdal
Page 198 and 199:
Bibliography 197Grefenstette, Grego
Page 200 and 201:
Bibliography 199Kurc, Roman. (2008)
Page 202 and 203:
Bibliography 201Mohammad, Saif and
Page 204 and 205:
Bibliography 203. (2006) “The pot
Page 206 and 207:
Bibliography 205and Technology 7(1-
Page 208 and 209:
List of Tables2.1 The size of the c
Page 210 and 211:
List of Figures2.1 The LU perspecti
Page 212 and 213:
List of Figures 2114.16 Completely
Page 214 and 215:
Index 213CBC, see Clustering by Com
Page 216 and 217:
Index 215169, 177, 178, 180, 182hyp
Page 218 and 219:
Index 217mutual hypernymy, 24Mutual
Page 220 and 221:
Index 219SUMO, 14Supported Vector M
Page 222:
A language without a wordnet is at
show all

A Wordnet from the Ground Up

Create successful ePaper yourself

Delete template?

Save as template?