06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

80 Chapter 3. Discovering Semantic RelatednessThe comparison of both evaluations performed on two different development versionsof MSR RW F shows that instead of <strong>the</strong> increasing accuracy of <strong>the</strong> measure in <strong>the</strong>WBST+H test, <strong>the</strong> percentage of wordnet relation instances remains stable. We needadditional extraction mechanisms in order to increase <strong>the</strong> percentage of <strong>the</strong> targetinstances in <strong>the</strong> results and differentiate between wordnet relations – see Chapter 4.According to <strong>the</strong> planned semi-automatic expansion of <strong>the</strong> adjective and verb partsof plWordNet, <strong>the</strong> respective MSRs were extracted using <strong>the</strong> joint corpus and <strong>the</strong>MSR RW F and MSR GRW F algorithms. The procedures followed <strong>the</strong> blueprint adoptedfor <strong>the</strong> nominal MSR. We acquired two sets, 4668 adjectival lemmas and 17990 verballemmas 19 . They came <strong>from</strong> <strong>the</strong> core plWordNet (2618 and 3239, respectively), <strong>the</strong>small Polish-English dictionary (Piotrowski and Saloni, 1999) and <strong>the</strong> joint corpus(those occurring ≥ 1000 times).Both MSRs were tested with WBST+H tests including 2814 QA pairs for adjectivallemmas and 5484 for verbal lemmas. The QA pairs encompass 1574 different adjectivallemmas (among <strong>the</strong>m 959 occur over 1000 times in <strong>the</strong> joint corpus) and 2960 differentverbal lemmas (1902 occur more than 1000 times). Some of <strong>the</strong>m occur in QA pairsmore than once but with different near-synonyms.Tables 3.13 and 3.14 show <strong>the</strong> results for different MSRs on <strong>the</strong> same tests forLUs of different frequency. For WBST+H <strong>the</strong> baseline random selection is 25%. Wedivided <strong>the</strong> analysed adjectival and verbal lemmas into two groups by <strong>the</strong>ir frequencyin IPIC: those occurring > 1000 and <strong>the</strong> o<strong>the</strong>rs. The results for <strong>the</strong> first group aregiven in Table 3.13. In Table 3.14 we present results obtained for all LUs.Working with <strong>the</strong> same generated co-incidence matrices for verbs and adjectives,we compared <strong>the</strong> application of RWF with three o<strong>the</strong>r measures: Lin’s measure (Lin,1998), CRMI (Weeds and Weir, 2005), RFF (Geffet and Dagan, 2004). From a largenumber of proposed solutions, we selected only <strong>the</strong> measures based on lexico-syntacticfeatures. Lin’s measure was included in <strong>the</strong> set because of its significant influence on<strong>the</strong> subsequent research. CRMI has been extensively compared with several o<strong>the</strong>rapproaches showing significant improvement. RFF was chosen for <strong>the</strong> idea of featureselection present in it. RFF is calculated in two phases: in <strong>the</strong> first phase featuresare evaluated and <strong>the</strong> best 100 are selected, re-weighted and used in LU similaritycalculation in <strong>the</strong> second phase. In all three approaches <strong>the</strong> similarity computation isbased in some way on Mutual Information weighting, which is also often used by o<strong>the</strong>rmethods. Finally, <strong>the</strong> approach of Freitag et al. (2005) is one of <strong>the</strong> few that deal with<strong>the</strong> similarity of adjectives and verbs.In <strong>the</strong> case of RWF, we also determined experimentally <strong>the</strong> threshold k for <strong>the</strong>number of features selected achieving <strong>the</strong> best results with19 Besides one-word lemmas, we only considered verbs paired with <strong>the</strong> reflexive particle się.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!