06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

108 Chapter 4. Extracting Relation InstancesMetaphor is a major source of errors and, even more so, are relations betweenlarger noun phrases, which <strong>the</strong> patterns assign only to <strong>the</strong> heads. A typical situation:NomToNom captures NLU2 that includes a relative clause, but only <strong>the</strong> head is considered.Even a nominal modifier in genitive or an adjectival modifier often makes <strong>the</strong>meaning of <strong>the</strong> noun phrase different <strong>from</strong> <strong>the</strong> lexical meaning of <strong>the</strong> head. The conditionsin mIInne do not constrain <strong>the</strong> case of <strong>the</strong> nominal LUs, so it is quite commonto erroneously recognize hyponymy for a noun in genitive that is not <strong>the</strong> head. It isnot easy, however, to identify complex Polish noun phrases in genitive. The error ratewould be cut if we could apply a good chunker or even a shallow parser combinedwith <strong>the</strong> analysis of <strong>the</strong> meaning relations between structurally related noun phrases –see, for example, (Jacquemin, 2001).Examples of LU pairs extracted by all three patterns appear in Figure 4.3. Figure 4.4presents examples of LU pairs extracted <strong>from</strong> <strong>the</strong> joint corpus by each of <strong>the</strong> threepatterns.The results of <strong>the</strong> application of lexico-morphosyntactic patterns are valuable, but<strong>the</strong>re remains an impression that more could be achieved by following <strong>the</strong> main line of<strong>the</strong> pattern-based paradigm. We will now shift our attention to approaches to automaticextraction and evaluation of more generic patterns.4.3 Generic Patterns Verified StatisticallyA manual construction of lexico-syntactic patterns is not laborious if we rely more onintuition than on an intensive survey of known hypernymy instances and <strong>the</strong> context of<strong>the</strong>ir occurrences in a corpus. Morin and Jacquemin (1999) proposed semi-automateddiscovery of lexico-syntactic patterns. Given a predefined list of hypernymy instances,sentences including <strong>the</strong>se LU pairs are extracted and transformed into “lexico-syntacticexpressions”. Next, common environments that generalise <strong>the</strong> expression are producedby considering <strong>the</strong> similarity of <strong>the</strong> expressions and a generalisation procedure: lexicosyntacticpatterns describing commonalities of expression subgroups are deduced. Thepattern extraction procedure still assumes manual verification of <strong>the</strong> deduced patterns,and <strong>the</strong> patterns are next applied without automatic evaluation of <strong>the</strong>ir accuracy and <strong>the</strong>reliability of <strong>the</strong> extracted pairs. The latter is especially important for <strong>the</strong> applicationof generic (weakly constraining) patterns to large corpora.Manually constructed patterns are claimed to have good precision but very lowrecall (Hearst, 1998). Recall can be increased by using more generic patterns extractedautomatically <strong>from</strong> a corpus, with broad coverage but intrinsically low precision.Most of <strong>the</strong> proposed methods follow <strong>the</strong> common scheme: given <strong>the</strong> initial examplesof <strong>the</strong> target relations, henceforth called seeds, patterns are generated <strong>from</strong> <strong>the</strong>corpus and next used to extract fur<strong>the</strong>r instances. Methods differ in pattern generation

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!