06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.3. Evaluation 51any pair of LUs, but also people are notoriously bad at working with real numbers.A linear ordering of dozens of LUs is nearly impossible, and even comparing twoterms requires a significantly complicated setup (Rubenstein and Goodenough, 1965).Given a small sample of <strong>the</strong> lists of <strong>the</strong> most semantically related LUs to <strong>the</strong> givenone, e.g., Table 3.11 and 3.12, people can easily distinguish a bad MSR <strong>from</strong> a goodone; we must distinguish good MSRs <strong>from</strong> those that are merely passable <strong>from</strong> <strong>the</strong>perspective of support for linguists working on wordnet development.We note three forms of MSR evaluation (Budanitsky and Hirst, 2006, Zeschand Gurevych, 2006):• ma<strong>the</strong>matical analysis of formal properties (for example, <strong>the</strong> property of a metricdistance (Lin, 1998)),• application-specific evaluation,• and comparison with human judgement.Ma<strong>the</strong>matical analysis gives few clues with respect to <strong>the</strong> results of future applicationsof an MSR. Evaluation via an application may make it difficult to separate <strong>the</strong> effect ofan MSR and o<strong>the</strong>r elements of <strong>the</strong> application (Zesch and Gurevych, 2006). A directcomparison to a manually created resource seems <strong>the</strong> least trouble-free. The constructionof such resources, however, is labour-intensive even if it only labels LU pairs assimilar (maybe just related (Zesch and Gurevych, 2006)) or not similar; this does notallow a fair assessment of <strong>the</strong> ordering of LUs on a continuous scale, as an MSR does.Indirect comparison with <strong>the</strong> existing resources (Grefenstette, 1993) is ano<strong>the</strong>rpossibility. For example, one could compare an MSR constructed automatically andano<strong>the</strong>r based on <strong>the</strong> semantic similarity across <strong>the</strong> hypernymy structure of PWN. Thisis how <strong>the</strong> main approaches work – see (Lin, 1998, Weeds and Weir, 2005, Geffetand Dagan, 2004). Two list of <strong>the</strong> k LUs most similar to <strong>the</strong> given one – for example,one constructed <strong>from</strong> an MSR and one <strong>from</strong> a wordnet – are transformed to ranknumbers of <strong>the</strong> subsequent LUs on <strong>the</strong> lists, and compared by <strong>the</strong> cosine measure. Thedrawback of such an evaluation is that we know how close <strong>the</strong> two similarity functionsare, but not how people perceive an MSR. The evaluation also strongly depends on<strong>the</strong> wordnet similarity function applied. There are a number of such functions – see(Budanitsky and Hirst, 2006) – but many of <strong>the</strong>m perform indifferently for a smallwordnet without full-fledged hypernymy structure (like <strong>the</strong> core plWordNet that wehad at our disposal during most experiments) or require synset probabilities. Moreover,wordnet similarity functions based on <strong>the</strong> hypernymy structure do not always work forverbs and adjectives, whose hierarchies tend to be quite limited. The similarity measureproposed by Mihalcea and Moldovan (1999) also does not apply in our case becauseplWordNet, like many o<strong>the</strong>r new wordnets, does not yet include glosses.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!