06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4.4. Benefits of Extracted Patterns 123Hum. eval. Ranking Prec. plWN Rel. R Inst.[%] 70% 60% 50% [%] Inst.ESP- 39 8 22 43 36 501 1.0 3982ESP-nm 47 5 14 62 37 561 1.54 6435ESPmorf- 45 13 18 71 39 361 0.75 2600ESPfree- 43 9 12 23 29 567 1.36 4621EST- 54 10 27 – 30 651 1.71 4917EST-nm 59 42 90 – 35 571 1.7 4457EST+nm 37 18 32 52 27 1312 2.38 10000Table 4.2: The influence of <strong>the</strong> extended reliability measure and changes in <strong>the</strong> pattern form (“Hum.eval.” – precision based on human judgement, “Ranking” – <strong>the</strong> number of <strong>the</strong> top instancesabove <strong>the</strong> precision threshold, “Prec. plWN” – precision in relation to plWordNet, “Rel. R”– relative recall relative to ESP-)manual evaluation. It means that EST-nm starting <strong>from</strong> <strong>the</strong> same seeds acquired <strong>from</strong>plWordNet goes beyond <strong>the</strong> source and extracts many instances which are not describedin plWordNet. This is a very promising feature concerning <strong>the</strong> potential application inexpanding plWordNet.We also observed that <strong>the</strong> value of <strong>the</strong> original reliability measure (4.1) decreasesvery fast. After <strong>the</strong> sixth iteration it goes far below 10-12. This explains <strong>the</strong> dropof <strong>the</strong> number of newly extracted instances. Applying <strong>the</strong> modified reliability formula(4.6) circumvents <strong>the</strong> problem.Ano<strong>the</strong>r matter of concern is <strong>the</strong> scheme of <strong>the</strong> patterns adjusted for Polish. It isclear that <strong>the</strong> application of <strong>the</strong> adjusted patterns produces better precision EST- andEST-nm in comparison to ESP- and ESP-nm. In <strong>the</strong> case of EST+nm, utilising <strong>the</strong>generic patterns, <strong>the</strong> precision is lower, but its relative recall shows its potential inextracting new instances. At <strong>the</strong> cost of reduced precision, <strong>the</strong> number of extractedinstances increases by <strong>the</strong> factor 2.38 (<strong>the</strong> total number of <strong>the</strong> extracted instancesdepends on <strong>the</strong> number of instances above <strong>the</strong> threshold).The second group of experiments was performed only for Estratto using genericpatterns and <strong>the</strong> extended reliability measure, i.e. for EST+nm. The aim was todetermine <strong>the</strong> influence of <strong>the</strong> algorithm parameters on <strong>the</strong> result. The followingdependencies were investigated:1. <strong>the</strong> influence of <strong>the</strong> confidence threshold on <strong>the</strong> precision of instances achievedwithin subsequent iterations,2. <strong>the</strong> influence of <strong>the</strong> number of seeds on <strong>the</strong> induced patterns, and <strong>the</strong>n <strong>the</strong>influence of <strong>the</strong> relation between instances and patterns induced by <strong>the</strong>m,3. <strong>the</strong> influence of <strong>the</strong> number of <strong>the</strong> top k patterns selected for <strong>the</strong> next iterationon <strong>the</strong> stability of <strong>the</strong> algorithm and <strong>the</strong> precision of instances,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!