06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

56 Chapter 3. Discovering Semantic RelatednessSeveral sets of tests compared <strong>the</strong> results of MSR with human performance. Eachtime a subset of QA pairs was randomly selected <strong>from</strong> a complete WBST+H test anda group of native speakers of Polish were asked to solve <strong>the</strong> test. They were instructedto select for each question word only one answer, <strong>the</strong> closest in meaning to <strong>the</strong> question.There was no time limit in <strong>the</strong> task. Most participants were Computer Science students,but <strong>the</strong> LUs selected were mostly frequent units without technical senses, so <strong>the</strong> raters’background need not have influenced <strong>the</strong> results.The first two tests for nominal LUs were generated <strong>from</strong> early versions of <strong>the</strong> coreplWordNet:• plWordNet <strong>from</strong> June 2006, 24 native speakers of Polish tested on 2 randomsubsets of WBST+H; a set included 79 QA pairs; <strong>the</strong> average score was 89.29%,and interjudge agreement within one set, measured by Cohen’s kappa (Cohen,1960), ranged between 0.19 and 0.47 (Piasecki et al., 2007a);• plWordNet <strong>from</strong> March 2007, several native speakers of Polish, a random subsetof WBST+H; <strong>the</strong> average result close to 100%.The results of <strong>the</strong> second test showed <strong>the</strong> limits of WBST+H. Lacking a fullerversion of plWordNet, we decided to define a more difficult test, WBST-style test tofacilitate fur<strong>the</strong>r work on MSRs for Polish nouns. This Enhanced WBST is presentedin detail in <strong>the</strong> next section.We also ran tests for verbal and adjectival LUs, both generated <strong>from</strong> <strong>the</strong> March2007 version of plWordNet. Twenty raters solved each test of a hundred QA pairs.The participants’ average scores appear in Table 3.3. The inter-judge agreement wasmeasured by Fleiss’s kappa, which accounts for agreement among many participants(Fleiss, 1971). The high value of kappa, supported by <strong>the</strong> manual evaluation of <strong>the</strong>test results, shows that <strong>the</strong> agreement was high, and <strong>the</strong> raters made similar errors.Examples of QA pairs appear in Figure 3.2. A comparison of <strong>the</strong> results of humanraters on <strong>the</strong> verbal and adjectival QA pairs – 88.21% and 88.9%, respectively, withalmost 100% for <strong>the</strong> nominal pairs – shows that <strong>the</strong> verbal and adjectival parts ofWBST+H are more difficult for humans 2 and that one should expect lower results <strong>from</strong><strong>the</strong> automatically extracted MSRs (Section 3.4.5).In 2008, ano<strong>the</strong>r WBST+H was generated for nouns, verbs and adjectives <strong>from</strong><strong>the</strong> final version of <strong>the</strong> core plWordNet (June 2008). 80 LUs were selected randomlyin 4 groups of 20 LUs for each range of LU frequency in <strong>the</strong> IPI PAN corpus(Przepiórkowski, 2004). We asked invited native speakers of Polish, mainly studentsof Computer Science, to solve <strong>the</strong> tests via dedicated Web pages. The results and <strong>the</strong>number of raters appear in Table 3.4.2 All three tests were generated <strong>from</strong> <strong>the</strong> same version of plWordNet.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!