06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

122 Chapter 4. Extracting Relation InstancesESPfree- – extends ESPmorf- by <strong>the</strong> representation of <strong>the</strong> free order of <strong>the</strong> instanceelements,EST- — Estratto without generic patterns, exploiting specific features of Polish, e.g.<strong>the</strong> agreement on values of selected categories is represented,EST-nm — Estratto without generic patterns, exploiting specific features of Polishlanguage and <strong>the</strong> extended reliability measures (4.6),EST+nm – <strong>the</strong> same as EST-nm but using generic patterns.If not stated o<strong>the</strong>rwise, <strong>the</strong> threshold for confidence is 1.0 for all ESP systems and2.6 for EST systems. The number k of top patterns was set to k = I + 2, where I is<strong>the</strong> iteration count. There were four iterations. In those experiments whose results wepresent, <strong>the</strong> focus was only on <strong>the</strong> hyponymy/hypernymy relation. IPIC was selected as<strong>the</strong> main corpus, on which we ran all experiments with results presented in <strong>the</strong> tables.We ran three groups of experiments on Espresso and Estratto (Kurc, 2008). Webegan with experiments designed to analyse <strong>the</strong> influence of <strong>the</strong> proposed extended reliabilitymeasure (4.6) and six pattern schemes: ESP-, ESP-mn, ESPmorf-, ESPfree-,EST-nm and EST+nm. The question was whe<strong>the</strong>r <strong>the</strong>y improve <strong>the</strong> results, since <strong>the</strong>ymay better cope with certain characteristics of Polish. In Table 4.2, <strong>the</strong> precisionbased on human judgement is presented in <strong>the</strong> column labelled “Hum. eval.”. Thelevels of precision defined in <strong>the</strong> column group labelled “Ranking” are achieved for<strong>the</strong> top subsets of instances described in <strong>the</strong> column group named “Inst.”. They showon how large a portion of <strong>the</strong> extracted instances we can rely, and how strongly. Take,for example, <strong>the</strong> first row: ESP- extracted <strong>the</strong> top 8% of instances with <strong>the</strong> precisionabove 70% and <strong>the</strong> top 22% with <strong>the</strong> precision 60%. The higher <strong>the</strong> numbers, <strong>the</strong>higher concentration of positive instances in <strong>the</strong> upper part of <strong>the</strong> extracted list. Theprecision measured in relation to plWordNet is presented in <strong>the</strong> column “Prec. plWN”(<strong>the</strong> number of <strong>the</strong> extracted plWordNet instances is also given). The column labelled“Rel. R” refers to <strong>the</strong> recall calculated in relation to <strong>the</strong> result of ESP-.The results of <strong>the</strong> first group of experiments, presented in Table 4.2, allow us toconclude that <strong>the</strong> modified reliability measure (4.6) performs better ei<strong>the</strong>r in <strong>the</strong> caseof <strong>the</strong> original Espresso scheme patterns (ESP-nm is <strong>the</strong> winner) or Estratto patternswhich take into account some properties of Polish (EST-nm had <strong>the</strong> best overall result).The situation is less clear in <strong>the</strong> case of <strong>the</strong> precision based on plWordNet – <strong>the</strong>differences are smaller – but still ESP-nm and EST-nm produce better results than<strong>the</strong> o<strong>the</strong>r versions; plWordNet is relatively small, however, and this could bias <strong>the</strong>calculation. The manual evaluation showed that in fact plWordNet might be used onlyfor a very rough estimation of precision. The plWordNet-based precision of ESPnmversus EST-nm is almost identical, but EST-nm is much better in relation to <strong>the</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!