06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

96 Chapter 3. Discovering Semantic Relatednessprobabilities, we used a slightly modified version of Leacock’s similarity measure(Agirre and Edmonds, 2006):()P ath(s 1 , s 2 )sim(s 1 , s 2 ) = − log, (3.9)max sa,sb P ath(s a , s b )P ath(a, b) is <strong>the</strong> length of a path between two synsets in plWordNet.Except for synset similarity, we follow (Lin and Pantel, 2002, Pantel, 2003) strictlyin o<strong>the</strong>r aspects of word-sense evaluation. Synset similarity is used to define <strong>the</strong>similarity between a word w and a synset s. Let S(w) be a set of wordnet synsetsincluding w (its senses). The similarity between s and w is defined as follows:simW (s, w) = max sim(s, t) (3.10)t∈S(w)The similarity of a synset s (a sense recorded in a wordnet) and a group of LUs c(extracted sense) is defined as <strong>the</strong> average similarity of LUs belonging to c. LU groupsextracted by CBC have no strict limits. Their members are of different similarity to <strong>the</strong>corresponding committee (sense pattern). The core of <strong>the</strong> LU group is defined in (Linand Pantel, 2002, Pantel, 2003) via a threshold κ 26 on <strong>the</strong> number of LUs belongingto <strong>the</strong> core. Let also c κ be <strong>the</strong> core of c – a subset of κ most similar members of c’scommittee. The similarity of c and s is defined as follows:∑w∈csimC(s, c) =κsimW (s, u)(3.11)κWe assume that a group c corresponds to a correct sense of w ifmax simC(s, c) ≥ θ (3.12)s∈S(w)The wordnet sense of LU w, corresponding to <strong>the</strong> sense of w represented bya lemma group c is defined as a synset which maximizes <strong>the</strong> value in formula 3.12:arg max simC(s, c) (3.13)s∈S(w)The question arises why this evaluation procedure is so indirect. Why do we notcompare <strong>the</strong> cores of <strong>the</strong> LU groups with wordnet synsets? The answer is seeminglysimple. Both in Polish and in English, certain matches are hard to obtain. LU groupsare indirectly based on <strong>the</strong> MSR used. They do not have clear limits, and still showsome closeness to a sense, but not to a strictly defined sense. On <strong>the</strong> o<strong>the</strong>r hand,wordnet synsets also have a substantial level of subjectivity in <strong>the</strong>ir definitions, especiallywhen <strong>the</strong>y are intended to describe concepts, which are not directly observable26 We changed <strong>the</strong> original symbol k to κ so as not to confuse it with k in <strong>the</strong> algorithm.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!