A Wordnet from the Ground Up

More documents

Recommendations

Info

4.5. Hybrid Combinations 141only (F-score = 0.3268) and measured in relation to manually annotate examples. Ourproblem setting is more difficult (we expect the classifier to distinguish e.g. betweenP2 and P3, while Snow et al. included all indirect hypernyms in KH) and we hadmuch fewer learning examples. Also, Snow et al. worked with a hybrid system thatcombined the hypernymy classifier with a MSR. It is more related to our WordNetWeaver system presented in the next section. Snow et al. had the best F-score of0.2714 for the classifier-only version.Leaving aside automatic evaluation, one can notice that the percentage of falsepositives is still significantly below 50%, which is a ratio that seems acceptable fora tool to support linguists. On the other hand, the number of LU pairs presented tolinguists dropped dramatically in comparison to MSR RW F alone, from 2300 to 733 –31.87% of the initial list. The classifier cannot be used alone as a support tool, butits ability ‘concentrate’ KH pairs in the positively classified group will be leveraged inthe next section for the construction of a tool combining different types of evidence inexpanding plWordNet.The results achieved on M for all classifiers were much poorer than the results onsets selected from plWordNet. We tried SVM as well, hoping for its usually goodperformance on numerical features without discretisation, but in contrast with thefindings of Kennedy (2006) we have not achieved any valuable result.In Figure 4.7 we present examples of classifier decisions made for elements of setM (classifier C4.5, ratio KH to NK 1:10, E included in NK).A manual inspection of false positives in the classification results on set M showsthat many pairs are co-hyponyms. They can be treated as positive answer from a linguists’spoint of view, but we tried to train the classifier not to select co-hyponyms asrelevant pairs.The results achieved on the data extracted from plWordNet are very promising,especially when we compare them to the results of similar experiments in (Snow et al.,2005), where the highest value of F-score was 0.348. A direct comparison, however,is not possible, because we used examples of KH and NH generated directly fromplWordNet, not from sentences in the corpora. Randomly generated pairs can includea larger percentage of obviously negative cases. On the other hand, plWordNet is muchsmaller than PWN applied in (Snow et al., 2005), so some NH pairs are in fact relevantpairs not yet added to plWordNet. This introduces substantial noise during training.The results on the manually annotated set M, and manually inspected, show thatthe performance of the classifiers on real data is lower. They have problems withdistinguishing co-hyponym pairs from relevant pairs, and there are more errors for lessobvious cases. Still, if we consider a task of delivering valuable suggestions to thelinguists, we have achieved an enormous improvement in comparison with the lists ofk most semantically related LUs. That is to say, a majority of the list elements are
142 Chapter 4. Extracting Relation Instancesakredytacja (accreditation)anegdota (anecdote)dwója (bad (lowest) mark)forteca (fortress)forteca (fortress)incydent (incident)instrument (instrument)owca (sheep)abonent (subscriber)cmentarz (cemetary)chwilka (fleeting moment)gniew (anger)jesion (ash tree)owoc (fruit)palec (finger, digit)paliwo (fuel)aktyw (activists)kompletność (completeness)oś (axle)otyłość (obesity)ożywienie (animation)agenda (agenda)alergia (allergy)ankieta (survey)komisariat (police station)lądowanie (loading)True positiveszezwolenie (permission)opowieść (tale)dwójka (dyad, pair)budowla (edifice)zamek (castle)zajście (incident)przyrząd (example)jagnię (lamb)False positivesodbiornik (receiver)zakwaterowanie (quarters)berbeć (toddler)strach (fear)konar (bough)grzyb (mushroom)nos (nose)odpad (waste)True negativesprzychód (income)zgodność (consistence, concordance)kierunek (direction)nowotwór (cancer)postęp (progress)False negativesprzedstawicielstwo (diplomatic agency)patologia (pathology)badanie (investigation)urząd (office)manewr (maneuver)Figure 4.7: Example results of the classification of LU pairs not present in plWordNet (at the time ofthe test) as relevant (near-synonyms and close hypernyms) and not relevant. The classifierwas C4.5, the positive to negative ratio 1:10; manually prepared negative examples were usedtogether with automatically generated examples
Page 1 and 2:
A Wordnetfrom the Ground Up
Page 3 and 4:
Work financed by the Polish Ministr
Page 7 and 8:
6 Prefaceheartfelt thanks go to all
Page 9:
8 Chapter 1. Motivation, Goals, Ear
Page 12 and 13:
1.1. Motivation 11[a] special form
Page 14 and 15:
1.1. Motivation 13Affect (Strappara
Page 16 and 17:
1.2. The Goals of the plWordNet Pro
Page 18 and 19:
1.2. The Goals of the plWordNet Pro
Page 20 and 21:
1.3. Early Decisions 19Merge Model:
Page 22:
1.3. Early Decisions 214. On the ot
Page 25 and 26:
24 Chapter 2. Building a Wordnet Co
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
48 Chapter 3. Discovering Semantic
Page 51 and 52:
Page 53 and 54:
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
Page 61 and 62:
Page 63 and 64:
Page 65 and 66:
Page 67 and 68:
Page 69 and 70:
Page 71 and 72:
Page 73 and 74:
Page 75 and 76:
Page 77 and 78:
Page 79 and 80:
Page 81 and 82:
Page 83 and 84:
Page 85 and 86:
Page 87 and 88:
Page 89 and 90:
Page 91 and 92: 90 Chapter 3. Discovering Semantic
Page 103 and 104: 102 Chapter 4. Extracting Relation
Page 141: 140 Chapter 4. Extracting Relation
Page 167 and 168: 166 Chapter 5. Polish WordNet Today
Page 186 and 187: Appendix ATests for Lexico-semantic
Page 188 and 189: 187Test for adjectives (T. IX)1. p1
Page 190 and 191: 189RelatednessTest for nouns (T. XV
Page 192 and 193:
BibliographyAgarwal, Abhaya and Alo
Page 194 and 195:
Bibliography 193on Deep Lexical Acq
Page 196 and 197:
Bibliography 195Derwojedowa, Magdal
Page 198 and 199:
Bibliography 197Grefenstette, Grego
Page 200 and 201:
Bibliography 199Kurc, Roman. (2008)
Page 202 and 203:
Bibliography 201Mohammad, Saif and
Page 204 and 205:
Bibliography 203. (2006) “The pot
Page 206 and 207:
Bibliography 205and Technology 7(1-
Page 208 and 209:
List of Tables2.1 The size of the c
Page 210 and 211:
List of Figures2.1 The LU perspecti
Page 212 and 213:
List of Figures 2114.16 Completely
Page 214 and 215:
Index 213CBC, see Clustering by Com
Page 216 and 217:
Index 215169, 177, 178, 180, 182hyp
Page 218 and 219:
Index 217mutual hypernymy, 24Mutual
Page 220 and 221:
Index 219SUMO, 14Supported Vector M
Page 222:
A language without a wordnet is at
show all

A Wordnet from the Ground Up

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?