06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

144 Chapter 4. Extracting Relation Instanceshypernymy structure is normally a graph. In any case, no automatic method can comeup with a credible top portion of a wordnet hierarchy. That is why <strong>the</strong> top levels ofplWordNet’s hypernymy hierarchy have been built manually, and we defined <strong>the</strong> focusof our research as a semi-automatic expansion of <strong>the</strong> core plWordNet. The constructedcore plWordNet <strong>the</strong>n serves as <strong>the</strong> springboard for what have turned out to be usefulsuggestions of attaching new lemmas to particular synsets in plWordNet. Such lemmaswould be attached as related to, but not necessarily synonymous with, LUs in thosesynsets.Several projects have explored <strong>the</strong> idea of building an expanded wordnet over anexisting one. Most of <strong>the</strong>m are focused, and have been tested, only on PWN. Theadvantage is <strong>the</strong> possibility of using <strong>the</strong> wordnet structure already in place, especially<strong>the</strong> hypernymy structure, as a knowledge source.Caraballo (1999, 2001) discusses an interesting attempt to overcome those problems.In her approach, <strong>the</strong> meaning of nouns is described simultaneously in two ways.In a distributional semantics model, for each noun a vector is constructed with <strong>the</strong>co-occurrence frequencies of this noun and o<strong>the</strong>r nouns in coordinate and appositiveconstructions. The frequencies are collected <strong>from</strong> parsed text. In a pattern-basedmodel, hypernym pairs are extracted by Hearst’s pattern (Hearst, 1992) X, Y, ando<strong>the</strong>r Zs. The vectors give a cosine-measure similarity of nouns and noun clusters.A binary tree of clusters is built following <strong>the</strong> scheme of agglomerative clustering.Next, internal tree nodes are assigned hypernyms of <strong>the</strong> branches by extracting <strong>from</strong><strong>the</strong> pattern-based pairs <strong>the</strong> most frequent hypernyms of <strong>the</strong> LUs in <strong>the</strong> given branch.Finally, <strong>the</strong> binary tree is “compressed” by removing internal nodes that have no hypernymsassigned or represent <strong>the</strong> same hypernyms as <strong>the</strong>ir parent node. A manualevaluation of a randomly selected sample showed that on average 33% of nouns wereassigned correctly as hyponyms of <strong>the</strong> examined hypernyms. The sample was verysmall and not representative, and a 33% precision is similar to <strong>the</strong> precission achievedin our experiments on pattern-based hypernym extraction. Carraballo’s approach, whileinteresting, required parsing (a drawback if no good parser is available) and was appliedto a limited domain of economy and texts <strong>from</strong> <strong>the</strong> Wall Street Journal. The achievedprecision seems limited and directly correlated with <strong>the</strong> precision of <strong>the</strong> patterns, and<strong>the</strong> constructed hierarchy is far <strong>from</strong> <strong>the</strong> wordnet synset structure: <strong>the</strong> number ofinternal nodes is small in comparison to <strong>the</strong> number of leaf clusters and <strong>the</strong>ir largesize.Alfonseca and Manandhar (2002) assigned to synsets a meaning representationbased on distributional semantics model, and treated <strong>the</strong> hypernymy structure labelledin that way as a kind of decision tree. To find a site for a new lemma, <strong>the</strong> treeis traversed top-down each time, choosing a branch with <strong>the</strong> highest distributionalsimilarity. The top-level synsets were mostly very general, so <strong>the</strong>y introduced a limitedpropagation of meaning vectors <strong>from</strong> children to parents.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!