A Wordnet from the Ground Up

More documents

Recommendations

Info

4.2. Benefits of Handwritten Patterns for Wordnet Expansion 105or(and(equal(cas[0],{nom}),rlook(1,end,$X, inter(flex[$X],{adjectival participles, noun,pronouns, verbal grammatical classes }) ),equal(base[$X],{"być"}),rlook($+1X,end,$Y,or(inter(flex[$Y], {adjectival passive participle,noun, pronouns, verbal grammatical classes }),and( equal(flex[$Y],{prep}),equal(cas[$Y],{inst})))),inter(flex[$Y],{subst,ger,depr}),equal(base[$Y],{"NP2"}),equal(cas[$Y],{inst}),equal(nmb[$Y],nmb[0])),a symmetrical condition for the right context)Figure 4.1: The essentials of the JestInst pattern implementation in JOSKIPIthat they have very similar accuracy. That is why we decided to merge them into acomplex pattern that combines the constraints using the or operator. We will refer tothis pattern as mIInne – see, for example, Table 4.1.4.2 Benefits of Handwritten Patterns for Wordnet ExpansionWe ran experiments on the extraction of hypernymic pairs on the same three corporaas those used for MSR extraction (Section 3.4.5): the IPI PAN Corpus [IPIC](≈ 254 million tokens) (Przepiórkowski, 2004), the Rzeczpospolita corpus [RzCorp](≈ 113 million tokens) (Rzeczpospolita, 2008), and a corpus of large texts in Polishfrom Internet (≈ 214 million tokens) [WebCorp]. Table 4.1 presents detailed resultsfor three patterns, JestInst, NomToNom and mIInne.We assessed the accuracy manually on randomly selected samples. Similarly toother manual evaluations (for example, Section 3.4.5), we determined sample sizesfollowing the method discussed in (Israel, 1992), aiming for the 95% confidence levelon the whole population. We used a program named Sprawdzacz (Kurc, 2008) thatfacilitates manual evaluation of the extracted lexico-semantic relation instances 5 .5 We thank Roman Kurc for his great help with the whole plWordNet project.
106 Chapter 4. Extracting Relation Instancesand(in(cas[0],nom),llook(-1,begin,$T,equal(base[$T],{"taki"})),equal(base[$+1T],{"jak"}),only($+2T,-1,$AR,or(inter(flex[$AR],{adjective , adjectival participles , adverb ,adverbial participles , noun ,numeral }),in(orth[$AR],{"i","lub","czy","oraz","a",",",":","(",")"}))),llook($-1T,begin,$N,and(inter(flex[$N],{noun }),equal(base[$N],{"base form of NLU2"}),in(cas[$N],{nom,acc,dat,inst,loc,voc}))),only($+1N,$-1T,$AL,or(inter(flex[$AL],{adjective , adjectival participles , adverb ,adverbial participles , numeral }),and(inter(flex[$AL],{noun , pronouns }),equal(cas[$AL],{gen})))))Figure 4.2: The essentials of the TakichJak pattern implementation in JOSKIPIDuring the evaluation, an extracted LU pair could be classified as a correct instanceof hypernymy (possibly indirect, with longer paths accepted), or as one of two formsof nearly correct instances:• not the expected hyponym/hypernym order; such pairs occurred more oftenamong the results of the NomToNom pattern in which the direction is not markedby grammatical case;• small inaccuracies in one of the LUs: it is part of a larger multiword LU, or ithas a wrong number value, or it is represented by a wrong root (a tagger error).All other pairs were classified as incorrect. The results in Table 4.1 have been calculatedwith the assumption that correct and nearly correct instances are positive. If weexcluded the nearly correct class, the results would be about 20% lower. The resultswould be very low if we only sought direct hypernymy. This clearly suggests that theextracted pairs are not directly helpful in expanding the core plWordNet, but they stillare a valuable source of knowledge. They show not only semantic similarity of theLUs in a pair, but also the direction of the relation. Indirect hypernyms can be helpful
Page 1 and 2:
A Wordnetfrom the Ground Up
Page 3 and 4:
Work financed by the Polish Ministr
Page 7 and 8:
6 Prefaceheartfelt thanks go to all
Page 9:
8 Chapter 1. Motivation, Goals, Ear
Page 12 and 13:
1.1. Motivation 11[a] special form
Page 14 and 15:
1.1. Motivation 13Affect (Strappara
Page 16 and 17:
1.2. The Goals of the plWordNet Pro
Page 18 and 19:
1.2. The Goals of the plWordNet Pro
Page 20 and 21:
1.3. Early Decisions 19Merge Model:
Page 22:
1.3. Early Decisions 214. On the ot
Page 25 and 26:
24 Chapter 2. Building a Wordnet Co
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
48 Chapter 3. Discovering Semantic
Page 51 and 52:
Page 53 and 54:
Page 55 and 56: 54 Chapter 3. Discovering Semantic
Page 103 and 104: 102 Chapter 4. Extracting Relation
Page 105: 104 Chapter 4. Extracting Relation
Page 157 and 158:
156 Chapter 4. Extracting Relation
Page 159 and 160:
Page 161 and 162:
Page 163 and 164:
Page 165 and 166:
Page 167 and 168:
166 Chapter 5. Polish WordNet Today
Page 169 and 170:
Page 171 and 172:
Page 173 and 174:
Page 175 and 176:
Page 177 and 178:
Page 179 and 180:
Page 181 and 182:
Page 183 and 184:
Page 186 and 187:
Appendix ATests for Lexico-semantic
Page 188 and 189:
187Test for adjectives (T. IX)1. p1
Page 190 and 191:
189RelatednessTest for nouns (T. XV
Page 192 and 193:
BibliographyAgarwal, Abhaya and Alo
Page 194 and 195:
Bibliography 193on Deep Lexical Acq
Page 196 and 197:
Bibliography 195Derwojedowa, Magdal
Page 198 and 199:
Bibliography 197Grefenstette, Grego
Page 200 and 201:
Bibliography 199Kurc, Roman. (2008)
Page 202 and 203:
Bibliography 201Mohammad, Saif and
Page 204 and 205:
Bibliography 203. (2006) “The pot
Page 206 and 207:
Bibliography 205and Technology 7(1-
Page 208 and 209:
List of Tables2.1 The size of the c
Page 210 and 211:
List of Figures2.1 The LU perspecti
Page 212 and 213:
List of Figures 2114.16 Completely
Page 214 and 215:
Index 213CBC, see Clustering by Com
Page 216 and 217:
Index 215169, 177, 178, 180, 182hyp
Page 218 and 219:
Index 217mutual hypernymy, 24Mutual
Page 220 and 221:
Index 219SUMO, 14Supported Vector M
Page 222:
A language without a wordnet is at
show all

A Wordnet from the Ground Up

Create successful ePaper yourself

Delete template?

Save as template?