06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.5. Hybrid Combinations 145Witschel (2005) applied a more radical decision-tree model with recursive upwardpropagation of meaning descriptions. The propagation only stops in <strong>the</strong> root, and <strong>the</strong>description of <strong>the</strong> upper nodes represents <strong>the</strong> description of descendants. A synset’ssemantic description is a set of LUs most similar to LUs <strong>from</strong> this synset. Similaritycalculation, following <strong>the</strong> distributional semantics model, is based on co-occurencesof LUs in corpus. Semantic descriptions of children nodes are recursively propagatedto parents and merged with <strong>the</strong>ir initial description. The resulting tree of semanticdescriptions is <strong>the</strong>n used as a decision tree to assign new lemmas. We select a branch by<strong>the</strong> highest similarity with a new lemma measured by <strong>the</strong> degree of matching betweendescriptions. Downward traversal stops in a node in which <strong>the</strong> mean of <strong>the</strong> similarityvalues with branches is greater than <strong>the</strong>ir variance. Evaluation was performed only ontwo subtrees taken <strong>from</strong> GermaNet: Moebel (furniture) (144 children) and Bauwerk(building) (902 children). The best accuracy of <strong>the</strong> exact classification was 14% and11% respectively, comparable to that achieved by Alfonseca and Manandhar (2002).Widdows (2003) represented LU meaning by <strong>the</strong> set of semantic neighbours – kmost similar LUs. The main idea for attaching a new lemma was to find a site in <strong>the</strong>hypernymy structure in which its semantic neighbours are concentrated. For semanticsimilarity calculation, each LU was first described by <strong>the</strong> co-occurrence, in a 15-wordtext window, with <strong>the</strong> selected 1000 most frequent one-word LUs. Parts of speech wereattached to words in <strong>the</strong> experiments that gave <strong>the</strong> best results. Similarity values werecomputed as in <strong>the</strong> Latent Semantic Analysis algorithm (Landauer and Dumais, 1997),cf Section 3.4.2. For <strong>the</strong> given LU and its first k semantic neighbours, a hypernym his chosen as its label (attachment point), such that it gives <strong>the</strong> highest sum over affinityscores between <strong>the</strong> subsequent neighbours and h. The affinity score is negative forneighbours which are not hyponyms of h, and positive o<strong>the</strong>rwise, with higher valuefor neighbours closer to h.Evaluation was on <strong>the</strong> British National Corpus (BNC, 2007) and randomly selectedcommon nouns, 200 each <strong>from</strong> three frequency ranges: >1000, [500, 1000] and

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!