06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.3. Evaluation 533.3.1 <strong>Wordnet</strong>-based synonymy test for PolishThe application of LSA to TOEFL data became unattractive as a method of comparingMSRs once <strong>the</strong> result of 97.5% hits has been achieved (Turney et al., 2003). Freitaget al. (2005) proposed a new test, WBST. It was seen as more difficult because itcontained many more questions. An instance of <strong>the</strong> test consists of many — hundreds oreven thousands — question-answer pairs [QA pairs]: 〈q, A〉, where A = a 1 , a 2 , a 3 , a 4and q, a i are LUs included in <strong>the</strong> wordnet that underlies <strong>the</strong> test ((Freitag et al., 2005)used PWN 2.0). In each QA pair <strong>the</strong>re is a i , henceforth called <strong>the</strong> correct answer,such that <strong>the</strong>re is a synset S in <strong>the</strong> wordnet and q, a i belong to S. None of <strong>the</strong> o<strong>the</strong>rthree a j belongs to <strong>the</strong> same synset as q or as a i . We will call such a j detractorsfor <strong>the</strong> given QA pair. During evaluation, MSR generates values for <strong>the</strong> pairs 〈q, a i 〉,a i ∈ A, expected to favour <strong>the</strong> correct answer against <strong>the</strong> detractors.The WBST has been, amongst o<strong>the</strong>r applications, used to evaluate MSRs for PolishLUs (nominal, verbal and adjectival). The underlying resource was plWordNet, usedin different development versions for different tests. Fur<strong>the</strong>r in this section we discusshow <strong>the</strong> wordnet used influences <strong>the</strong> difficulty of <strong>the</strong> test.The test had to be slightly modified. In plWordNet, many synsets have only1–2 LUs, in accordance with <strong>the</strong> definition of <strong>the</strong> synset and usage of LUs as basicplWordNet entries, see Section 2.1. In order to get a better coverage of LUs by WBSTquestions, and not to leave LUs in singleton synsets untested, <strong>the</strong> direct hypernymsof LUs <strong>from</strong> singleton synsets were taken to form QA pairs 1 (Piasecki et al., 2007a).We named this modification <strong>the</strong> WBST with Hypernyms [WBST+H]. The inclusion ofhypernyms in QA pairs did not make <strong>the</strong> test easier, as was shown in (Piasecki et al.,2007a).plWordNet has been evolving <strong>from</strong> <strong>the</strong> early versions including fewer LUs, broadersynsets with more vague understanding of near-synonymy (larger percentage of synsetswith more than two LUs) and shallower hypernymy structure, to <strong>the</strong> present versionof plWordNet expanded semi-automatically (Section 4.5.3), in which most synsets arenarrow (1–2 LUs on average) and <strong>the</strong> hypernymy structure is significantly deeper.Having broader synsets puts in <strong>the</strong> same broad synset <strong>the</strong> LUs hard to distinguishusing an MSR (<strong>the</strong>y are very close in meaning). There is, <strong>the</strong>refore, no need todistinguish between <strong>the</strong>ir meaning during <strong>the</strong> test. In a version of plWordNet withnarrower synsets, <strong>the</strong> same LUs may have already been separated into two differentsynsets usually not related by direct hypernymy. One can expect that narrower synsetsobtained by dividing a broader one would be co-hyponyms. The hypernymy hierarchydeepened with <strong>the</strong> subsequent versions of plWordNet. This tendency was due to <strong>the</strong>1 In <strong>the</strong> case of adjectival LUs this technique has a limited application, because <strong>the</strong> number ofhypernymy instances is very small in <strong>the</strong> case of adjectival synsets – only 142 instances (plWordNet <strong>from</strong>October 2008).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!