06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

60 Chapter 3. Discovering Semantic RelatednessEWBST, Nouns, plWordNet 12.2006Q: aromat (aroma)A: bukiet (bouquet), fetor (stench),smrodek (stink (diminutive)), smród (stink)EWBST, Nouns, plWordNet 09.2007Q: aromat (aroma)A: bukiet (bouquet), fetor (stench),powódź (reason),upał (heat)EWBST, Nouns, plWordNet 1.0Q: aromat (aroma)A: bukiet (bouquet), piorun (thunderbolt),widmo (phantom),zadymka (snowstorm)WBST+H, Nouns, plWordNet 1.0Q: aromat (aroma)A: bukiet (bouquet), faworyzowanie (favouring),harówka (drudgery), matematyka (ma<strong>the</strong>matics)Figure 3.4: Examples of QA pairs with detractors generated <strong>from</strong> different versions of plWordNet for <strong>the</strong>same QA pairalong <strong>the</strong> structure and less frequently drawn as detractors. The QA pairs generated<strong>from</strong> broad synsets were often vaguely semantically related and were harder for bothtests to differentiate <strong>from</strong> <strong>the</strong> question-detractor pairs, which were often also vaguelyrelated.We also tested raters’ performance on EWBST for <strong>the</strong> needs of future comparisonswith <strong>the</strong> performance of <strong>the</strong> automatically extracted MSRs. During <strong>the</strong> first experiment,an example EWBST test generated <strong>from</strong> <strong>the</strong> March 2007 plWordNet was given to 32native speakers of Polish, all of <strong>the</strong>m Computer Science students 5 . The test consistedof 99 QA pairs. All LUs in <strong>the</strong> test were selected <strong>from</strong> 5706 single-word noun LUs inplWordNet. In <strong>the</strong> set of question LUs, 42 LUs occurred more 1000 times in <strong>the</strong> IPIPAN corpus (Przepiórkowski, 2004). This subset was distinguished in <strong>the</strong> test, becausesuch LUs are also <strong>the</strong> basis of <strong>the</strong> comparison with <strong>the</strong> results achieved in (Freitaget al., 2005).For all QA pairs <strong>the</strong> result was 70%, with <strong>the</strong> 61.62% minimum, 78.79% maximumand σ = 4.07% standard deviation <strong>from</strong> <strong>the</strong> mean. For <strong>the</strong> subset consisting of frequentLUs, <strong>the</strong> average result was 63.24%, with <strong>the</strong> minimum 52.38%, maximum 73.81%and σ = 5.37%.5 As in experiments with WBST+H, this bias in <strong>the</strong> background should not influence <strong>the</strong> results,because <strong>the</strong> test was composed <strong>from</strong> plWordNet which at present includes only general Polish vocabulary.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!