A Wordnet from the Ground Up

More documents

Recommendations

Info

3.4. Measures of Semantic Relatedness 79(Broda et al., 2009) showed that one can hardly expect to achieve a significantly betterresult with any other MSR.A closer inspection of MSRlist (x,k) lists – Tables 3.11 and 3.12 – shows examples– reveals, however, that (though many pairs are clearly semantically related) thepercentage of instances of wordnet relations is much below the psychological barrierof 50%. It is also very hard to find any clear threshold above which the MSR valueguarantees that a given pair of LUs is a instance of a wordnet relation 17 . These intuitionswere confirmed in an experiment with the manual analysis of the 364 LUpairs from MSRlist (x,k) lists. The pairs were selected randomly from the MSR RW Fextracted from IPIC for the needs of (Derwojedowa et al., 2008). There was a manualassessment of each pair 〈x, y〉 such that y ∈ MSRlist (x,k) and MSR(x, y) ≥ τ18 MSRas belonging to one of the wordnet relations. Half of the pairs did not belong to anyof these relations. The other half appeared to be worth browsing. In 7% of caseswe found two synonyms already present in plWordNet, but only 1% of new synonympairs. 20% of pairs were close hyponyms or hypernyms (not necessarily direct) alreadypresent in plWordNet, and 16% of new close hyponyms/hypernyms and co-hyponymswere discovered. 1% of known meronyms and holonyms were found and 5% of newones were discovered.The size of the corpus used for MSR extraction is significant for MSR’s accuracy.We therefore repeated in 2008 the experiment with manual evaluation of theMSRlist (x,k) list. We used an MSR RW F extracted from the joint corpus for the finalplWordNet expansion assisted by the WordNet Weaver system, see Section 4.5.1000 LU pairs were randomly selected and evaluated by one of the co-authors, andassigned to one of the classes described below:• 523 LU pairs were not instances of any wordnet relation, 4 LU pairs includederrors caused by the morphosyntactic preprocessing (such as a non-word formgenerated by the morphological guesser), and 5 pairs contained an incompletemultiword LU, but in the hypernymy relation to the second member of the pair,• 228 LU pairs included elements linked by the synonymy or hypernymy/hyponymyrelation (the latter not necessarily direct) — only 16 instances had been alreadydescribed in plWordNet,• 158 LU pairs were co-hyponyms or close “cousins” (indirect co-hyponyms),• 66 LU pairs represented meronymy/holonymy, and 12 pairs were co-meronyms,• 4 LU pairs were antonyms.17 The MSR values seem not to be directly comparable among different target LUs for whichMSRlist (x,k) lists are generated.18 We set τ MSR to 0.2.
80 Chapter 3. Discovering Semantic RelatednessThe comparison of both evaluations performed on two different development versionsof MSR RW F shows that instead of the increasing accuracy of the measure in theWBST+H test, the percentage of wordnet relation instances remains stable. We needadditional extraction mechanisms in order to increase the percentage of the targetinstances in the results and differentiate between wordnet relations – see Chapter 4.According to the planned semi-automatic expansion of the adjective and verb partsof plWordNet, the respective MSRs were extracted using the joint corpus and theMSR RW F and MSR GRW F algorithms. The procedures followed the blueprint adoptedfor the nominal MSR. We acquired two sets, 4668 adjectival lemmas and 17990 verballemmas 19 . They came from the core plWordNet (2618 and 3239, respectively), thesmall Polish-English dictionary (Piotrowski and Saloni, 1999) and the joint corpus(those occurring ≥ 1000 times).Both MSRs were tested with WBST+H tests including 2814 QA pairs for adjectivallemmas and 5484 for verbal lemmas. The QA pairs encompass 1574 different adjectivallemmas (among them 959 occur over 1000 times in the joint corpus) and 2960 differentverbal lemmas (1902 occur more than 1000 times). Some of them occur in QA pairsmore than once but with different near-synonyms.Tables 3.13 and 3.14 show the results for different MSRs on the same tests forLUs of different frequency. For WBST+H the baseline random selection is 25%. Wedivided the analysed adjectival and verbal lemmas into two groups by their frequencyin IPIC: those occurring > 1000 and the others. The results for the first group aregiven in Table 3.13. In Table 3.14 we present results obtained for all LUs.Working with the same generated co-incidence matrices for verbs and adjectives,we compared the application of RWF with three other measures: Lin’s measure (Lin,1998), CRMI (Weeds and Weir, 2005), RFF (Geffet and Dagan, 2004). From a largenumber of proposed solutions, we selected only the measures based on lexico-syntacticfeatures. Lin’s measure was included in the set because of its significant influence onthe subsequent research. CRMI has been extensively compared with several otherapproaches showing significant improvement. RFF was chosen for the idea of featureselection present in it. RFF is calculated in two phases: in the first phase featuresare evaluated and the best 100 are selected, re-weighted and used in LU similaritycalculation in the second phase. In all three approaches the similarity computation isbased in some way on Mutual Information weighting, which is also often used by othermethods. Finally, the approach of Freitag et al. (2005) is one of the few that deal withthe similarity of adjectives and verbs.In the case of RWF, we also determined experimentally the threshold k for thenumber of features selected achieving the best results with19 Besides one-word lemmas, we only considered verbs paired with the reflexive particle się.
Page 1 and 2:
A Wordnetfrom the Ground Up
Page 3 and 4:
Work financed by the Polish Ministr
Page 7 and 8:
6 Prefaceheartfelt thanks go to all
Page 9:
8 Chapter 1. Motivation, Goals, Ear
Page 12 and 13:
1.1. Motivation 11[a] special form
Page 14 and 15:
1.1. Motivation 13Affect (Strappara
Page 16 and 17:
1.2. The Goals of the plWordNet Pro
Page 18 and 19:
1.2. The Goals of the plWordNet Pro
Page 20 and 21:
1.3. Early Decisions 19Merge Model:
Page 22:
1.3. Early Decisions 214. On the ot
Page 25 and 26:
24 Chapter 2. Building a Wordnet Co
Page 27 and 28:
26 Chapter 2. Building a Wordnet Co
Page 29 and 30: 28 Chapter 2. Building a Wordnet Co
Page 49 and 50: 48 Chapter 3. Discovering Semantic
Page 79: 78 Chapter 3. Discovering Semantic
Page 103 and 104: 102 Chapter 4. Extracting Relation
Page 131 and 132:
130 Chapter 4. Extracting Relation
Page 133 and 134:
Page 135 and 136:
Page 137 and 138:
Page 139 and 140:
Page 141 and 142:
Page 143 and 144:
Page 145 and 146:
Page 147 and 148:
Page 149 and 150:
Page 151 and 152:
Page 153 and 154:
Page 155 and 156:
Page 157 and 158:
Page 159 and 160:
Page 161 and 162:
Page 163 and 164:
Page 165 and 166:
Page 167 and 168:
166 Chapter 5. Polish WordNet Today
Page 169 and 170:
Page 171 and 172:
Page 173 and 174:
Page 175 and 176:
Page 177 and 178:
Page 179 and 180:
Page 181 and 182:
Page 183 and 184:
Page 186 and 187:
Appendix ATests for Lexico-semantic
Page 188 and 189:
187Test for adjectives (T. IX)1. p1
Page 190 and 191:
189RelatednessTest for nouns (T. XV
Page 192 and 193:
BibliographyAgarwal, Abhaya and Alo
Page 194 and 195:
Bibliography 193on Deep Lexical Acq
Page 196 and 197:
Bibliography 195Derwojedowa, Magdal
Page 198 and 199:
Bibliography 197Grefenstette, Grego
Page 200 and 201:
Bibliography 199Kurc, Roman. (2008)
Page 202 and 203:
Bibliography 201Mohammad, Saif and
Page 204 and 205:
Bibliography 203. (2006) “The pot
Page 206 and 207:
Bibliography 205and Technology 7(1-
Page 208 and 209:
List of Tables2.1 The size of the c
Page 210 and 211:
List of Figures2.1 The LU perspecti
Page 212 and 213:
List of Figures 2114.16 Completely
Page 214 and 215:
Index 213CBC, see Clustering by Com
Page 216 and 217:
Index 215169, 177, 178, 180, 182hyp
Page 218 and 219:
Index 217mutual hypernymy, 24Mutual
Page 220 and 221:
Index 219SUMO, 14Supported Vector M
Page 222:
A language without a wordnet is at
show all

A Wordnet from the Ground Up

Create successful ePaper yourself

Delete template?

Save as template?