A Wordnet from the Ground Up

More documents

Recommendations

Info

4.4. Benefits of Extracted Patterns 129runs of Espresso/Estratto or browsing the corpus to find occurrences of promising LUpairs. A similar problem is with finding the appropriate parameter values. In ourexperience, trial runs of the algorithm for each corpus used are needed before gettingresults that satisfy our expectations.Additionally it turned out that in order to maintain a stable representation of relations,there must be an appropriate ratio between patterns and instances. The pattern:instancesratio estimated during experiments is between 1:15 and 1:20. If thereare fewer instances, the algorithm becomes unstable. Using more instances results ina longer computation time.An interesting result is the observation of the “intensifying” patterns. Such patternsdo not represent any particular semantic relation. When applied alone, they extractinstances of relations of multiple types. When an intensifying pattern is combinedwith regular ones, it delivers additional statistical evidence to correct but infrequentinstances. This lift the algorithm’s precision. An example (Polish w means “in”):(hypo/holo:subst:nom) w (hyper/mero:subst:inst)We observed a problem with the number of instances collected by the ESP+/EST+versions of the algorithms that use generic patterns. This number is comparable tothe number of instances extracted by ESP-/EST-, but one would expect it to be muchhigher. This might be a result of the characteristic features of the IPIC corpus or of thesize of the validating corpus. This problem might be partially solved by using the Webas a validating corpus. Unfortunately, Polish LUs have multiple word forms, so Googlequeries must be more complicated. The other reason might be the limited expressivepower of the patterns – an aspect of the algorithm that should be investigated.The extended structure of Estratto patterns still seems to miss some lexico-semanticdependencies, especially in stylistically rich text. The experiments on extracting hypernymyfrom the Internet-based corpus, mostly consisting of literary texts, were unsuccessful.The first step towards strengthening patterns is to take into account possibleagreements in elements of the patterns that match the instances. The patterns used inEST are very strict about grammatical categories. For example, the pattern(hypo:subst:gen) i inny (hyper:subst:gen)(two nouns in genitive) is treated as a completely different pattern from(hypo:subst:inst) i inny (hyper:subst:inst)(two nouns in instrumental).
130 Chapter 4. Extracting Relation InstancesIt seems to be helpful to allow merging such patterns, maybe like this:(hypo:subst:case1) i inny (hyper:subst:case1).The results for ESP- and EST-, where there are no such strict constraints, suggestsome increase in recall. Another way, much more complicated, is to enrich the patternrepresentation, so that additional syntactic information (at least about nominal LUs)can be used.The list of acquired instances cannot be directly imported to plWordNet. First ofall, the list is flat. There is no information on synsets. The percentage of erroneousLU pairs on the lists (such as 63% for EST+nm) is too high to trust the list as sourceof automatic expansion of the plWordNet hypernymy structure. Also, many positiveLU pairs represent in fact quite remote hypernymic links.These observations show the drawbacks, but there also are pluses. EST+nm extracted3700 hypernymic LU pairs (37% of the 10000 LU pairs). This informationcan be combined with MSR G RW F , producing higher values for wordnet relation instances.The MSR alone does not say what kind of relation made two LU closelysemantically related. The information acquired by Estratto sheds light on this issue.Section 4.5.3 presents a fairly succesful algorithm based on this reasoning. A manualcomparison of the LU pairs extracted by Estratto and the three manual patterns revealsthat both sets are disjoint to some extent. We noted earlier that manual patterns aremore expressive and can find hypernymic instances in language construction which areinaccessible for the present Estratto patterns. This can be changed in the future extensionsof Estratto, but for now we used both types of patterns in the hybrid algorithmof plWordNet expansion in Section 4.5.3.4.5 Hybrid Combinations: Patterns, Distributional Semanticsand ClassifiersWe noted at the end of Section 3.4.5 that Measures of Semantic Relatedness [MSRs]can recognize semantically related LUs with an accuracy approaching human performance.Still, MSRs produce lists of the k LUs most semantically related to the givenLU x [MSRlist (x,k) ] with few instances of wordnet relations, and they do not knowhow to distinguish the direction of a relation. We named two ways of compensatingfor these drawbacks: introduce a classifier operating on MSRlists (x,k) , capable ofdifferencing relations, or combine a MSR with other sources of knowledge, includinglexico-syntactic patterns or the existing wordnet structure. This subsection willexamine both possibilities.
Page 1 and 2:
A Wordnetfrom the Ground Up
Page 3 and 4:
Work financed by the Polish Ministr
Page 7 and 8:
6 Prefaceheartfelt thanks go to all
Page 9:
8 Chapter 1. Motivation, Goals, Ear
Page 12 and 13:
1.1. Motivation 11[a] special form
Page 14 and 15:
1.1. Motivation 13Affect (Strappara
Page 16 and 17:
1.2. The Goals of the plWordNet Pro
Page 18 and 19:
1.2. The Goals of the plWordNet Pro
Page 20 and 21:
1.3. Early Decisions 19Merge Model:
Page 22:
1.3. Early Decisions 214. On the ot
Page 25 and 26:
24 Chapter 2. Building a Wordnet Co
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
48 Chapter 3. Discovering Semantic
Page 51 and 52:
Page 53 and 54:
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
Page 61 and 62:
Page 63 and 64:
Page 65 and 66:
Page 67 and 68:
Page 69 and 70:
Page 71 and 72:
Page 73 and 74:
Page 75 and 76:
Page 77 and 78:
Page 79 and 80: 78 Chapter 3. Discovering Semantic
Page 103 and 104: 102 Chapter 4. Extracting Relation
Page 129: 128 Chapter 4. Extracting Relation
Page 167 and 168: 166 Chapter 5. Polish WordNet Today
Page 181 and 182:
180 Chapter 5. Polish WordNet Today
Page 183 and 184:
182 Chapter 5. Polish WordNet Today
Page 186 and 187:
Appendix ATests for Lexico-semantic
Page 188 and 189:
187Test for adjectives (T. IX)1. p1
Page 190 and 191:
189RelatednessTest for nouns (T. XV
Page 192 and 193:
BibliographyAgarwal, Abhaya and Alo
Page 194 and 195:
Bibliography 193on Deep Lexical Acq
Page 196 and 197:
Bibliography 195Derwojedowa, Magdal
Page 198 and 199:
Bibliography 197Grefenstette, Grego
Page 200 and 201:
Bibliography 199Kurc, Roman. (2008)
Page 202 and 203:
Bibliography 201Mohammad, Saif and
Page 204 and 205:
Bibliography 203. (2006) “The pot
Page 206 and 207:
Bibliography 205and Technology 7(1-
Page 208 and 209:
List of Tables2.1 The size of the c
Page 210 and 211:
List of Figures2.1 The LU perspecti
Page 212 and 213:
List of Figures 2114.16 Completely
Page 214 and 215:
Index 213CBC, see Clustering by Com
Page 216 and 217:
Index 215169, 177, 178, 180, 182hyp
Page 218 and 219:
Index 217mutual hypernymy, 24Mutual
Page 220 and 221:
Index 219SUMO, 14Supported Vector M
Page 222:
A language without a wordnet is at
show all

A Wordnet from the Ground Up

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?