A Wordnet from the Ground Up

More documents

Recommendations

Info

3.4. Measures of Semantic Relatedness 83Features Lin CRM MI PMI RW F zscore GRW F LinNArg(acc) 69.17 57.52 63.68 62.51 70.86NArg(dat) 50.00 24.54 46.16 28.19 50.10NArg(inst) 65.37 51.52 58.32 46.44 67.97NArg(loc) 63.02 54.67 57.71 47.78 65.81Nsb 63.68 56.41 57.65 66.32 65.59VPart 55.81 51.30 53.11 54.10 56.70VAdv 75.21 60.06 64.00 72.44 75.49NArg(acc+dat+inst+loc) 72.03 64.57 68.95 68.70 73.87NSb+NArg+VPart+VAdv 74.16 64.83 70.86 75.94 75.33AAdv 66.15 21.57 58.77 63.86 67.02AA 80.41 72.16 77.25 74.95 81.90ANmod 81.96 75.39 80.60 83.57 82.46ANmod+AAdv 82.33 75.08 81.34 85.00 83.26ANmod+AA 82.77 76.94 83.70 86.42 83.39ANmod+AAdv+AA 84.44 76.63 83.70 86.92 86.55Table 3.13: Experiments with MSRs for frequent lemmas (> 1000 occurrences in joined corpora)• k = 10000 for the frequent adjectives, k = 1000 for the frequent verbs• k = 1000 for all adjectives, and k = 1000 for all verbs.An automatic mechanism of the k value adjustment on the basis of data analysiswould be a valuable extension of the RWF method. It must be noted, however, thatthe range of results achieved for different k values is limited. For example, in thecase of frequent verbs and the joint matrix NSb+NArg+VPart+VAdv we get 73.23%for k = 100, 76.24% for k = 500, 77.12% for k = 1000 and 76.88% k = 5000.Results become stable around k = 300 and only a slight tuning is required by findingthe optimal value of k. There was a similar result for nouns (Piasecki et al., 2007b).In the case of verb constraints, the highest results by a single type of a constraintis generated, surprisingly, by a simple closest adverb identification. NArg(dat) andNArg(inst) matrices are too sparse and the identification of a subject generates toomany errors (we do not apply any parser). For a joined matrix, however, RWF selectsfeatures effectively enough to achieve a result that is significantly better than any singleverb matrix.In the case of adjectives, the differences of accuracy achieved for different typesof constraints are much smaller. The joined matrix is also better than any single one.Hatzivassiloglou and McKeown (1993) claim that co-occurrence of two adjectives inone noun phrase (clearly indicated in Polish by their morphosyntactic agreement) isa negative feature. This claim is contradicted by the result of AA alone and AAcombined with other matrices.
84 Chapter 3. Discovering Semantic RelatednessFeatures Lin CRMI PMI RW F zscore GRW F LinNArg(acc) 60.47 58.13 55.53 56.89 63.35NArg(dat) 38.59 23.98 36.40 26.29 39.06NArg(inst) 55.60 45.59 50.75 42.54 57.57NArg(loc) 51.42 46.72 48.54 41.47 54.50Nsb 53.15 54.54 47.68 57.79 54.78VPart 46.70 44.69 44.89 47.28 48.32VAdv 65.32 53.77 58.04 64.19 66.50NArg(acc+dat+inst+loc) 65.13 65.04 60.94 64.17 68.05NSb+NArg+VPart+VAdv 67.10 67.80 62.42 62.41 71.85AAdv 57.60 20.86 53.27 58.96 58.71AA 74.24 71.86 71.75 72.32 76.87ANmod 76.12 74.77 73.99 79.18 77.75ANmod+AAdv 76.97 75.41 74.80 81.13 78.93ANmod+AA 78.18 78.32 78.43 83.05 79.89ANmod+AAdv+AA 79.71 78.32 78.39 83.26 82.48Table 3.14: Experiments with MSRs for all lemmasThe result of our best adjective MSR is very close to the result achieved by humans(Section 3.3.1). For verbs, the difference is comparable to that observed fornouns (Piasecki et al., 2007b) (but the result of verb MSR still approaches humanperformance).The constructed MSRs are intended to assist linguists in selecting LUs semanticallyrelated to the LU being edited. Lexicographers can find missing synonyms or instancesof lexico-semantic relations while browsing the MSRlist (x,k) lists (according to theMSRs).Long suggestion lists may preclude careful analysis. We chose k = 20 for a smallexperiment to test a possible future use of both MSRs by linguists. We randomlyselected two subsets of lemmas, verbs and adjectives. We determined sample sizes insuch a way that the results of the manual evaluation performed on the samples couldbe ascribed to the whole sets with the 95% confidence level, according to the methoddiscussed in (Israel, 1992). For every LU in each subset, we generated the list of thek = 20 LUs most related to the given one. One of the co-authors manually assessedall elements on all lists, distinguishing any elements that are in some wordnet relationto the head LU.The evaluated LU lists were classified into:• very useful – a half, or almost a half, of the LUs on the list are in some semanticrelation to the given one,• useful – a sizable part of the list is somehow related,
Page 1 and 2:
A Wordnetfrom the Ground Up
Page 3 and 4:
Work financed by the Polish Ministr
Page 7 and 8:
6 Prefaceheartfelt thanks go to all
Page 9:
8 Chapter 1. Motivation, Goals, Ear
Page 12 and 13:
1.1. Motivation 11[a] special form
Page 14 and 15:
1.1. Motivation 13Affect (Strappara
Page 16 and 17:
1.2. The Goals of the plWordNet Pro
Page 18 and 19:
1.2. The Goals of the plWordNet Pro
Page 20 and 21:
1.3. Early Decisions 19Merge Model:
Page 22:
1.3. Early Decisions 214. On the ot
Page 25 and 26:
24 Chapter 2. Building a Wordnet Co
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34: 32 Chapter 2. Building a Wordnet Co
Page 49 and 50: 48 Chapter 3. Discovering Semantic
Page 83: 82 Chapter 3. Discovering Semantic
Page 103 and 104: 102 Chapter 4. Extracting Relation
Page 135 and 136:
134 Chapter 4. Extracting Relation
Page 137 and 138:
Page 139 and 140:
Page 141 and 142:
Page 143 and 144:
Page 145 and 146:
Page 147 and 148:
Page 149 and 150:
Page 151 and 152:
Page 153 and 154:
Page 155 and 156:
Page 157 and 158:
Page 159 and 160:
Page 161 and 162:
Page 163 and 164:
Page 165 and 166:
Page 167 and 168:
166 Chapter 5. Polish WordNet Today
Page 169 and 170:
Page 171 and 172:
Page 173 and 174:
Page 175 and 176:
Page 177 and 178:
Page 179 and 180:
Page 181 and 182:
Page 183 and 184:
Page 186 and 187:
Appendix ATests for Lexico-semantic
Page 188 and 189:
187Test for adjectives (T. IX)1. p1
Page 190 and 191:
189RelatednessTest for nouns (T. XV
Page 192 and 193:
BibliographyAgarwal, Abhaya and Alo
Page 194 and 195:
Bibliography 193on Deep Lexical Acq
Page 196 and 197:
Bibliography 195Derwojedowa, Magdal
Page 198 and 199:
Bibliography 197Grefenstette, Grego
Page 200 and 201:
Bibliography 199Kurc, Roman. (2008)
Page 202 and 203:
Bibliography 201Mohammad, Saif and
Page 204 and 205:
Bibliography 203. (2006) “The pot
Page 206 and 207:
Bibliography 205and Technology 7(1-
Page 208 and 209:
List of Tables2.1 The size of the c
Page 210 and 211:
List of Figures2.1 The LU perspecti
Page 212 and 213:
List of Figures 2114.16 Completely
Page 214 and 215:
Index 213CBC, see Clustering by Com
Page 216 and 217:
Index 215169, 177, 178, 180, 182hyp
Page 218 and 219:
Index 217mutual hypernymy, 24Mutual
Page 220 and 221:
Index 219SUMO, 14Supported Vector M
Page 222:
A language without a wordnet is at
show all

A Wordnet from the Ground Up

Create successful ePaper yourself

Delete template?

Save as template?