06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4.4. Benefits of Extracted Patterns 129runs of Espresso/Estratto or browsing <strong>the</strong> corpus to find occurrences of promising L<strong>Up</strong>airs. A similar problem is with finding <strong>the</strong> appropriate parameter values. In ourexperience, trial runs of <strong>the</strong> algorithm for each corpus used are needed before gettingresults that satisfy our expectations.Additionally it turned out that in order to maintain a stable representation of relations,<strong>the</strong>re must be an appropriate ratio between patterns and instances. The pattern:instancesratio estimated during experiments is between 1:15 and 1:20. If <strong>the</strong>reare fewer instances, <strong>the</strong> algorithm becomes unstable. Using more instances results ina longer computation time.An interesting result is <strong>the</strong> observation of <strong>the</strong> “intensifying” patterns. Such patternsdo not represent any particular semantic relation. When applied alone, <strong>the</strong>y extractinstances of relations of multiple types. When an intensifying pattern is combinedwith regular ones, it delivers additional statistical evidence to correct but infrequentinstances. This lift <strong>the</strong> algorithm’s precision. An example (Polish w means “in”):(hypo/holo:subst:nom) w (hyper/mero:subst:inst)We observed a problem with <strong>the</strong> number of instances collected by <strong>the</strong> ESP+/EST+versions of <strong>the</strong> algorithms that use generic patterns. This number is comparable to<strong>the</strong> number of instances extracted by ESP-/EST-, but one would expect it to be muchhigher. This might be a result of <strong>the</strong> characteristic features of <strong>the</strong> IPIC corpus or of <strong>the</strong>size of <strong>the</strong> validating corpus. This problem might be partially solved by using <strong>the</strong> Webas a validating corpus. Unfortunately, Polish LUs have multiple word forms, so Googlequeries must be more complicated. The o<strong>the</strong>r reason might be <strong>the</strong> limited expressivepower of <strong>the</strong> patterns – an aspect of <strong>the</strong> algorithm that should be investigated.The extended structure of Estratto patterns still seems to miss some lexico-semanticdependencies, especially in stylistically rich text. The experiments on extracting hypernymy<strong>from</strong> <strong>the</strong> Internet-based corpus, mostly consisting of literary texts, were unsuccessful.The first step towards streng<strong>the</strong>ning patterns is to take into account possibleagreements in elements of <strong>the</strong> patterns that match <strong>the</strong> instances. The patterns used inEST are very strict about grammatical categories. For example, <strong>the</strong> pattern(hypo:subst:gen) i inny (hyper:subst:gen)(two nouns in genitive) is treated as a completely different pattern <strong>from</strong>(hypo:subst:inst) i inny (hyper:subst:inst)(two nouns in instrumental).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!