A Wordnet from the Ground Up

More documents

Recommendations

Info

3.3. Evaluation 51any pair of LUs, but also people are notoriously bad at working with real numbers.A linear ordering of dozens of LUs is nearly impossible, and even comparing twoterms requires a significantly complicated setup (Rubenstein and Goodenough, 1965).Given a small sample of the lists of the most semantically related LUs to the givenone, e.g., Table 3.11 and 3.12, people can easily distinguish a bad MSR from a goodone; we must distinguish good MSRs from those that are merely passable from theperspective of support for linguists working on wordnet development.We note three forms of MSR evaluation (Budanitsky and Hirst, 2006, Zeschand Gurevych, 2006):• mathematical analysis of formal properties (for example, the property of a metricdistance (Lin, 1998)),• application-specific evaluation,• and comparison with human judgement.Mathematical analysis gives few clues with respect to the results of future applicationsof an MSR. Evaluation via an application may make it difficult to separate the effect ofan MSR and other elements of the application (Zesch and Gurevych, 2006). A directcomparison to a manually created resource seems the least trouble-free. The constructionof such resources, however, is labour-intensive even if it only labels LU pairs assimilar (maybe just related (Zesch and Gurevych, 2006)) or not similar; this does notallow a fair assessment of the ordering of LUs on a continuous scale, as an MSR does.Indirect comparison with the existing resources (Grefenstette, 1993) is anotherpossibility. For example, one could compare an MSR constructed automatically andanother based on the semantic similarity across the hypernymy structure of PWN. Thisis how the main approaches work – see (Lin, 1998, Weeds and Weir, 2005, Geffetand Dagan, 2004). Two list of the k LUs most similar to the given one – for example,one constructed from an MSR and one from a wordnet – are transformed to ranknumbers of the subsequent LUs on the lists, and compared by the cosine measure. Thedrawback of such an evaluation is that we know how close the two similarity functionsare, but not how people perceive an MSR. The evaluation also strongly depends onthe wordnet similarity function applied. There are a number of such functions – see(Budanitsky and Hirst, 2006) – but many of them perform indifferently for a smallwordnet without full-fledged hypernymy structure (like the core plWordNet that wehad at our disposal during most experiments) or require synset probabilities. Moreover,wordnet similarity functions based on the hypernymy structure do not always work forverbs and adjectives, whose hierarchies tend to be quite limited. The similarity measureproposed by Mihalcea and Moldovan (1999) also does not apply in our case becauseplWordNet, like many other new wordnets, does not yet include glosses.
52 Chapter 3. Discovering Semantic RelatednessAutomatic differentiation between words synonymous and not synonymous witha given LU is a natural application for an MSR, especially in the context of generation ofsuggestions for a linguist. In Latent Semantic Analysis [LSA] (Landauer and Dumais,1997) the MSR constructed using a statistical analysis of a corpus (cf Section 3.4.2)was used to make decisions in a synonymy test, a component of the Test of English asa Foreign Language [TOEFL]. This gave 64.4% of hits. Turney (2001) reported 73.75%hits, and Turney et al. (2003) 97.5% hits; the latter practically solved the TOEFLsynonymy problem. TOEFL is focused on humans, a big advantage for applicationsin MSR evaluation. On the other hand, it is manually constructed, hence its maindrawbacks: limited size and fixed orientation on synonymy.Freitag et al. (2005) proposed a WordNet-Based Synonymy Test [WBST], whichseems to offer an interesting response to the limitations of TOEFL. WBST has beenbased on the use of PWN to generate “a large set of questions identical in formatto those in the TOEFL”. WBST is discussed in details in Section 3.3.1, but its twoproperties are worth emphasising now. First, it is larger and broader than TOEFLbecause it is automatically generated from a very large manually constructed resource.Second, with a change in the way of selecting question-answer pairs, a WBST-like testcan evolve from a synonymy test to a test oriented toward wordnet relations or in thesense of (Mohammad and Hirst, 2006).The best reported result for English nouns is 75.8% (Freitag et al., 2005). A slightlymodified WBST was used to evaluate an MSR for Polish nouns (Piasecki et al., 2007a)with the result of 86.09%.The evaluation of an MSR via a synonymy test shows the ability of the MSR todistinguish synonyms from non-synonyms. Since the MSR is the centrepiece of theapplication, the achieved results can be directly attributed to it. There was, however,a problem: WBST appeared to be too easy, as we show in Section 3.3.1. It is orientedtoward testing the main distinction — closely semantically related versus unrelated —because the incorrect answers are selected randomly and on average they are semanticallyunrelated to the question and the answer. The usefulness of WBST is thereforelimited with respect to its use in the development of more sophisticated MSRs focusedon semantic similarity and wordnet relations.In view of these findings, we have explored the possibility of generating more demandingautomatic methods of MSR assessment, following the general idea of WBST.We proposed an Enhanced WBST [EWBST] which is precisely a template of WBSTlikeevaluation methods parameterised by the way in which detractors, i.e. false answers,are selected. We wanted its results to be easily interpreted by people and itsfeasibility tested on people. We also expected that it would pick the MSR that isa better tool for the recognition of lexico-semantic relations between LUs.
Page 1 and 2: A Wordnetfrom the Ground Up
Page 3 and 4: Work financed by the Polish Ministr
Page 7 and 8: 6 Prefaceheartfelt thanks go to all
Page 9: 8 Chapter 1. Motivation, Goals, Ear
Page 12 and 13: 1.1. Motivation 11[a] special form
Page 14 and 15: 1.1. Motivation 13Affect (Strappara
Page 16 and 17: 1.2. The Goals of the plWordNet Pro
Page 18 and 19: 1.2. The Goals of the plWordNet Pro
Page 20 and 21: 1.3. Early Decisions 19Merge Model:
Page 22: 1.3. Early Decisions 214. On the ot
Page 25 and 26: 24 Chapter 2. Building a Wordnet Co
Page 49 and 50: 48 Chapter 3. Discovering Semantic
Page 51: 50 Chapter 3. Discovering Semantic
Page 103 and 104:
102 Chapter 4. Extracting Relation
Page 105 and 106:
Page 107 and 108:
Page 109 and 110:
Page 111 and 112:
Page 113 and 114:
Page 115 and 116:
Page 117 and 118:
Page 119 and 120:
Page 121 and 122:
Page 123 and 124:
Page 125 and 126:
Page 127 and 128:
Page 129 and 130:
Page 131 and 132:
Page 133 and 134:
Page 135 and 136:
Page 137 and 138:
Page 139 and 140:
Page 141 and 142:
Page 143 and 144:
Page 145 and 146:
Page 147 and 148:
Page 149 and 150:
Page 151 and 152:
Page 153 and 154:
Page 155 and 156:
Page 157 and 158:
Page 159 and 160:
Page 161 and 162:
Page 163 and 164:
Page 165 and 166:
Page 167 and 168:
166 Chapter 5. Polish WordNet Today
Page 169 and 170:
Page 171 and 172:
Page 173 and 174:
Page 175 and 176:
Page 177 and 178:
Page 179 and 180:
Page 181 and 182:
Page 183 and 184:
Page 186 and 187:
Appendix ATests for Lexico-semantic
Page 188 and 189:
187Test for adjectives (T. IX)1. p1
Page 190 and 191:
189RelatednessTest for nouns (T. XV
Page 192 and 193:
BibliographyAgarwal, Abhaya and Alo
Page 194 and 195:
Bibliography 193on Deep Lexical Acq
Page 196 and 197:
Bibliography 195Derwojedowa, Magdal
Page 198 and 199:
Bibliography 197Grefenstette, Grego
Page 200 and 201:
Bibliography 199Kurc, Roman. (2008)
Page 202 and 203:
Bibliography 201Mohammad, Saif and
Page 204 and 205:
Bibliography 203. (2006) “The pot
Page 206 and 207:
Bibliography 205and Technology 7(1-
Page 208 and 209:
List of Tables2.1 The size of the c
Page 210 and 211:
List of Figures2.1 The LU perspecti
Page 212 and 213:
List of Figures 2114.16 Completely
Page 214 and 215:
Index 213CBC, see Clustering by Com
Page 216 and 217:
Index 215169, 177, 178, 180, 182hyp
Page 218 and 219:
Index 217mutual hypernymy, 24Mutual
Page 220 and 221:
Index 219SUMO, 14Supported Vector M
Page 222:
A language without a wordnet is at
show all

A Wordnet from the Ground Up

Create successful ePaper yourself

Delete template?

Save as template?