A Wordnet from the Ground Up

More documents

Recommendations

Info

4.5. Hybrid Combinations 145Witschel (2005) applied a more radical decision-tree model with recursive upwardpropagation of meaning descriptions. The propagation only stops in the root, and thedescription of the upper nodes represents the description of descendants. A synset’ssemantic description is a set of LUs most similar to LUs from this synset. Similaritycalculation, following the distributional semantics model, is based on co-occurencesof LUs in corpus. Semantic descriptions of children nodes are recursively propagatedto parents and merged with their initial description. The resulting tree of semanticdescriptions is then used as a decision tree to assign new lemmas. We select a branch bythe highest similarity with a new lemma measured by the degree of matching betweendescriptions. Downward traversal stops in a node in which the mean of the similarityvalues with branches is greater than their variance. Evaluation was performed only ontwo subtrees taken from GermaNet: Moebel (furniture) (144 children) and Bauwerk(building) (902 children). The best accuracy of the exact classification was 14% and11% respectively, comparable to that achieved by Alfonseca and Manandhar (2002).Widdows (2003) represented LU meaning by the set of semantic neighbours – kmost similar LUs. The main idea for attaching a new lemma was to find a site in thehypernymy structure in which its semantic neighbours are concentrated. For semanticsimilarity calculation, each LU was first described by the co-occurrence, in a 15-wordtext window, with the selected 1000 most frequent one-word LUs. Parts of speech wereattached to words in the experiments that gave the best results. Similarity values werecomputed as in the Latent Semantic Analysis algorithm (Landauer and Dumais, 1997),cf Section 3.4.2. For the given LU and its first k semantic neighbours, a hypernym his chosen as its label (attachment point), such that it gives the highest sum over affinityscores between the subsequent neighbours and h. The affinity score is negative forneighbours which are not hyponyms of h, and positive otherwise, with higher valuefor neighbours closer to h.Evaluation was on the British National Corpus (BNC, 2007) and randomly selectedcommon nouns, 200 each from three frequency ranges: >1000, [500, 1000] and
146 Chapter 4. Extracting Relation InstancesSnow et al. (2006) cast the expansion of wordnet hypernymy structure in termsof a probabilistic model. Attachment of new elements of the structure transformsthe former structure T into a new structure T’. Among many possible T’, the mostappropriate one is probably the one that maximises the probability of the introducedchange in relation to the evidence at hand. The change caused by the addition of onenew relation instance R ij is described in (Snow et al., 2006) as follows:( ) EPT∆ T (R ij ) = ( ′ ) (4.13)EPT ′T and T’ are the old and new taxonomies (hypernymy structures), the latter resultingfrom adding the R ij instance of hypernymy. E is collected evidence (of any kind).The computation of the complete multiplicative change is based on all added relationinstances, as well as implied relation instances. For example, adding a newhyponym to the LU y implies the hypernymy relation to all hypernymic ancestors of y.The algorithm of taxonomy extension proposed by Snow et al. (2006) works accordingto the best-first search scheme that maximises a criterion based on the multiplicativechange calculated for the extended and old taxonomies. The sources of evidenceapplied during experiments with expanding PWN were:• a classifier-based algorithm of extracting hypernymic LU pairs on the basis oflexico-syntactic relations (proposed in (Snow et al., 2005) and briefly discussedin Section 4.5.1),• a proposed algorithm of extraction of (m, n)-cousins derived from the algorithmpresented by Ravichandran et al. (2002).The relation of (m, n)-cousins holds for those noun pairs which have a commonhypernymic ancestor at a distance of, respectively, m and n. The algorithm of extractionof (m, n)-cousins is based on a two-step procedure. First, nouns occurring in 70 millionwebpages are clustered into 1000 clusters. For each noun pair, the similarity calculationis based on shared clusters and the minimum across cosine measure between the nounsand cluster centroids. Second, a classifier (based on softmax regression) classifyingnoun pairs as cousins is trained on cousins extracted from PWN and described by theirsimilarity in relation to the cluster-derived similarity.Both classifiers used by Snow et al. (2006), i.e. classifier of hypernymic pairs and(m, n)-cousins return probabilities of their decisions. For new nouns (not present inPWN), the decisions suggest possible hypernyms and cousins in the PWN hypernymystructure and probabilities returned by the classifiers are used in computing multiplicativechanges and identifying the hypernymy links finally added by the algorithm.
Page 1 and 2:
A Wordnetfrom the Ground Up
Page 3 and 4:
Work financed by the Polish Ministr
Page 7 and 8:
6 Prefaceheartfelt thanks go to all
Page 9:
8 Chapter 1. Motivation, Goals, Ear
Page 12 and 13:
1.1. Motivation 11[a] special form
Page 14 and 15:
1.1. Motivation 13Affect (Strappara
Page 16 and 17:
1.2. The Goals of the plWordNet Pro
Page 18 and 19:
1.2. The Goals of the plWordNet Pro
Page 20 and 21:
1.3. Early Decisions 19Merge Model:
Page 22:
1.3. Early Decisions 214. On the ot
Page 25 and 26:
24 Chapter 2. Building a Wordnet Co
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
48 Chapter 3. Discovering Semantic
Page 51 and 52:
Page 53 and 54:
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
Page 61 and 62:
Page 63 and 64:
Page 65 and 66:
Page 67 and 68:
Page 69 and 70:
Page 71 and 72:
Page 73 and 74:
Page 75 and 76:
Page 77 and 78:
Page 79 and 80:
Page 81 and 82:
Page 83 and 84:
Page 85 and 86:
Page 87 and 88:
Page 89 and 90:
Page 91 and 92:
Page 93 and 94:
Page 95 and 96: 94 Chapter 3. Discovering Semantic
Page 103 and 104: 102 Chapter 4. Extracting Relation
Page 145: 144 Chapter 4. Extracting Relation
Page 167 and 168: 166 Chapter 5. Polish WordNet Today
Page 186 and 187: Appendix ATests for Lexico-semantic
Page 188 and 189: 187Test for adjectives (T. IX)1. p1
Page 190 and 191: 189RelatednessTest for nouns (T. XV
Page 192 and 193: BibliographyAgarwal, Abhaya and Alo
Page 194 and 195: Bibliography 193on Deep Lexical Acq
Page 196 and 197:
Bibliography 195Derwojedowa, Magdal
Page 198 and 199:
Bibliography 197Grefenstette, Grego
Page 200 and 201:
Bibliography 199Kurc, Roman. (2008)
Page 202 and 203:
Bibliography 201Mohammad, Saif and
Page 204 and 205:
Bibliography 203. (2006) “The pot
Page 206 and 207:
Bibliography 205and Technology 7(1-
Page 208 and 209:
List of Tables2.1 The size of the c
Page 210 and 211:
List of Figures2.1 The LU perspecti
Page 212 and 213:
List of Figures 2114.16 Completely
Page 214 and 215:
Index 213CBC, see Clustering by Com
Page 216 and 217:
Index 215169, 177, 178, 180, 182hyp
Page 218 and 219:
Index 217mutual hypernymy, 24Mutual
Page 220 and 221:
Index 219SUMO, 14Supported Vector M
Page 222:
A language without a wordnet is at
show all

A Wordnet from the Ground Up

Create successful ePaper yourself

Delete template?

Save as template?