06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 4Extracting Instancesof Semantic RelationsPattern-based approaches originate <strong>from</strong> <strong>the</strong> observation that dictionary definitionsin Machine-Readable Dictionaries [MRD] follow a limited number of fixed schemescharacterised also by a limited range of syntactic constructions. A typical dictionarydefinition of a word sense describes it by a genus term followed by set of differentiaeor a short description referring to related words, e.g. (Matsumoto, 2003). Work on <strong>the</strong>extraction of lexico-semantic relations <strong>from</strong> MRD started already in <strong>the</strong> 1980s (Amsler,1981), following (Matsumoto, 2003). Dictionary definitions are processed usingpatterns that identifying selected expressions. Patterns ei<strong>the</strong>r are regular expressionsor or are written in a formal language of a similar expressive power.In corpora, patterns are used to recognise pairs of LUs as instances of a specifiedlexico-semantic relation. Consider, for example, a pattern <strong>from</strong> <strong>the</strong> seminal work ofHearst (1992):NP 0 ... such as NP 1 , NP 2 ... (and | or ) NP nIt implies that each noun phrase NP i is a hyponym of <strong>the</strong> noun phrase NP 0 , or,more precisely, <strong>the</strong> hypernymy relation holds between LUs represented in <strong>the</strong> text by<strong>the</strong> given noun phrases. Hearst (1992, 1998) constructed manually only five patternsfrequently matched in a corpus and appealingly accurate. The accuracy was measuredas a number of LU pairs linked by <strong>the</strong> hyponymy relation in PWN to all those extracted.For <strong>the</strong> pattern shown above, for example, 61 of 106 extracted LU pairs <strong>from</strong> GrolierEncyclopedia were confirmed in PWN (Hearst, 1992).The implicit assumption here is that one can construct patterns accurate enoughto draw correct conclusions <strong>from</strong> single occurrences of pairs of LUs. In general ,however, it seems barely possible due, amongst o<strong>the</strong>rs, to <strong>the</strong> presence of metaphor.Without deeper semantic and pragmatic analysis, instances of metaphor may be hardto distinguish <strong>from</strong> literal uses. Hearst extracted aeroplane as a hyponym of targetand Washington as an instance of nationalist; such derived associations are clearlyspecific to particular documents <strong>from</strong> which <strong>the</strong>y were extracted. Ano<strong>the</strong>r problem is<strong>the</strong> scarcity of pattern instances in corpora; merely 46 instances were acquired <strong>from</strong>20 million words of <strong>the</strong> New York Times corpus (Hearst, 1992).101

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!