06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

64 Chapter 3. Discovering Semantic Relatednessbroader semantic associations among LUs 9 . Such contexts tend to extract semanticrelatedness sensu largo ra<strong>the</strong>r than (more desirable) closer semantic similarity.In approaches based on lexico-syntactic constraints, a target LU is described byinstances of its lexico-syntactic relations with particular LUs. As an example, for <strong>the</strong>noun bird we find <strong>the</strong> constraint subject of(sing ) met in texts. Hindle (1990)used a deterministic parser and analysed relations of nouns with verbs as subjects andobjects. Two measures, subject similarity and object similarity of two nouns in relationto a given verb, were calculated <strong>from</strong> <strong>the</strong> collected frequencies. The final MSR valuefor a pair of nouns was defined as a sum of both similarities across all verbs. Indefining MSR for 26742 nouns, Hindle used only 4789 verbs for which at least onesentence or clause structure (274613 in total) was recognised by <strong>the</strong> parser. Lexicosyntacticconstraints were applied for <strong>the</strong> construction of MSRs also by Ruge (1992),Grefenstette (1993), Widdows (2004), Weeds and Weir (2005).Lin (1998) applied a shallow dependency parser, MiniPar (Lin, 1993), to <strong>the</strong> preprocessingand identification of syntactic dependencies that involve nouns. The numberof different syntactic relation utilised for <strong>the</strong> MSR computation is not given; MiniParrecognises several hundred syntactic dependency relations, about 200 of which describedependency links involving noun phrase heads. Examples in (Lin, 1998) suggest thatmany different relation were used in defining lexico-syntactic constraints. The correlationof <strong>the</strong> MSRlist (x,k) list generated <strong>from</strong> Lin’s MSR with <strong>the</strong> MSRlist (x,k) listgenerated on <strong>the</strong> PWN-based similarity appeared to be much higher than <strong>the</strong> correlationwith lists generated <strong>from</strong> <strong>the</strong> MSR proposed in (Hindle, 1990). The result showedthat <strong>the</strong> use of a large set of syntactic dependencies, not only based on <strong>the</strong> subject andobject relations, improves <strong>the</strong> MSR.In <strong>the</strong> experiments on Polish data, we observed progress in WBST+H with <strong>the</strong>addition of constraints of different types. For example, here are <strong>the</strong> observationsin <strong>the</strong> experiments performed for (Piasecki et al., 2007b): while <strong>the</strong> MSRs basedon <strong>the</strong> individual constraints expressing only adjectival modification and noun coordinationachieve 88.65% and 76.85%, respectively, an MSR based on <strong>the</strong> combinationof both constraints achieves 90.92% in WBST+H. We also made a comparison of MSRsconstructed as described in (Piasecki and Broda, 2007):• LSA applied to a subcorpus of <strong>the</strong> IPI PAN Corpus [IPIC] (Przepiórkowski,2004) including 185066 documents <strong>from</strong> a daily Polish newspaper – 58.07% inWBST+H generated <strong>from</strong> <strong>the</strong> core plWordNet,9 The same tendency could be observed in <strong>the</strong> similar experiments performed on a corpus of 584million token (<strong>the</strong> joint corpus, Section 3.4.5 and plWordNet <strong>from</strong> November 2008. We compared twoMSRs extracted for nominal LUs (13285, described in Section 3.4.5): one based on lexico-syntacticconstraints and ano<strong>the</strong>r on pure co-occurrence in <strong>the</strong> text window of ±5 tokens. The results achieved inWBST+H test, 88.14% and 75.20% respectively, and 67.95% and 58.86% in EWBST, seem to support<strong>the</strong> claim that <strong>the</strong> use of text-window contexts results in less precise discrimination of LU meanings.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!