06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

118 Chapter 4. Extracting Relation Instancesjest = be number=sg,person=3rd , hypo = hyponym, hiper = hypernym, subst = substantive,nom and inst are case values, nominative and instrumental.Pantel and Pennacchiotti (2006) write that patterns can be induced by any patternlearningalgorithm, but only <strong>the</strong> longest common substring algorithm proposed byRavichandran and Hovy (2002) was used. The same algorithm was <strong>the</strong> basis for <strong>the</strong>generalisation and unification of patterns in Estratto. The algorithm is heuristicallyguided by a predefined list of relation-specific LUs. Hypernymy, for example, can besignalled by być (be), stać się (become), taki (such), inny (o<strong>the</strong>r), and so on.In Espresso, <strong>the</strong> inferred patterns are <strong>the</strong>n generalized by replacing all multiwordterms (subsets of noun phrases) by <strong>the</strong> TR labels. Such for Polish might be unworkable:a robust definition of a multiword term as a regular expression seems unattainable – notto mention <strong>the</strong> lack of a chunker for Polish. As a slightly different method, matchinglocations are specified via morphological similarity to contexts (partial morphologicalspecification: part of speech and values for <strong>the</strong> selected grammatical categories), andvia predefined relation-specific LUs.The instance extraction phase follows patterns induction and selection. An instanceis a pair 〈x, y〉 of LUs – instances of <strong>the</strong> target semantic relation. The authorsof Espresso suggest that, given a small corpus, two methods can be used to enrich <strong>the</strong>instance set. First, each multiword LU in an instance can be progressively simplifieddown to <strong>the</strong> head, for example, new record of a criminal conviction → new record →record. A new instance is created <strong>the</strong> simplified first LU and <strong>the</strong> second LU intact.Second, a pattern is instantiated only with ei<strong>the</strong>r x or y, and new instances are retrieved<strong>from</strong> <strong>the</strong> Web or an additional large corpus. For example, given <strong>the</strong> pair (dog, animal)and <strong>the</strong> Estratto pattern(hypo:subst:nom) is a/an (hyper:subst:inst)we create two queries:dogis a/an (hyper:subst:inst)(hypo:subst:nom) is a/an animalInstances collected using both <strong>the</strong>se methods are added to <strong>the</strong> instance set. Let usnote that in all experiments described by Pantel and Pennacchiotti (2006) only onewordLUs have been used, and <strong>the</strong> corpora were presumably large enough to providestatistical evidence.Generalized patterns we described are not considered generic so long as <strong>the</strong>y do notgenerate ten times more instances than <strong>the</strong> average number of instances extracted byreliable patterns <strong>from</strong> <strong>the</strong> previous iteration. High recall, however, results in decreasedprecision, so every instance extracted by a generic pattern is verified. The verificationprocess starts with instantiating all non-generic patterns with <strong>the</strong> instances to be

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!