06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

124 Chapter 4. Extracting Relation Instances4. <strong>the</strong> dependency on <strong>the</strong> filtering infrequent and very frequent patterns and instances.5. <strong>the</strong> way in which various statistical similarity measures used in reliability calculationchange <strong>the</strong> precision of <strong>the</strong> results.Human Eval. [%] Relative Recall InstancesEST+nm:th1.0 12 0.79 24552EST+nm:th2.6 37 1.00 10000EST+nm:th5.2 48 0.54 4170EST+nm:5seeds 22 0.71 11882EST+nm:10seeds 25 0.84 12476EST+nm:15seeds 24 0.85 13189EST+nm:5insts/1patt 24 0.83 12773EST+nm:10insts/1patt 29 1.03 13188EST+nm:40insts/1patt 37 1.00 10000EST+nm:k4 37 1.00 10000EST+nm:k8 41 2.80 25361EST+nm:k12 38 2.70 26501Table 4.3: The dependence of <strong>the</strong> algorithms on <strong>the</strong> parameter values (Kurc, 2008)In case 1 it seems that <strong>the</strong> highest threshold gives <strong>the</strong> best results – see Table 4.3and <strong>the</strong> first three rows, but a too high threshold decreases <strong>the</strong> total number of <strong>the</strong>extracted proper instances, as <strong>the</strong> relative recall is significantly decreased. There must,however, be a balanced ratio between instances selected for <strong>the</strong> next iteration andnew patterns induced. With few instances, <strong>the</strong>re is no statistical evidence to induceproper patterns, and EST/ESP crawls picking almost random patterns. That leads to<strong>the</strong> decrease in precision.Initial seeds, case 2 (marked ‘nnseeds’ in Table 4.3, where nn is <strong>the</strong> number) aremeant to generate a skeleton of a model of <strong>the</strong> lexico-semantic relation. If <strong>the</strong> numberof seeds is not high enough, <strong>the</strong> best extracted patterns can be random. Of course, onecould collect a small number of seeds that would indicate only expected patterns, butthat would require a precise analysis of <strong>the</strong> corpus used for instance extraction. Thatis pointless, because by using more seeds one can acquire <strong>the</strong> same patterns with lesseffort.The influence of <strong>the</strong> number of instances preserved between two subsequent iterationsis similar to <strong>the</strong> influence of <strong>the</strong> number of seeds, see <strong>the</strong> rows marked‘nninsts/1patt’ in Table 4.3 – nn preserved instances for one pattern. More instanceskept, and next used for <strong>the</strong> evaluation of <strong>the</strong> patterns, give better description of <strong>the</strong>whole model. According to <strong>the</strong> experiments, at least 15 seeds and 10 instances for

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!