A Wordnet from the Ground Up
A Wordnet from the Ground Up - School of Information Technology ...
A Wordnet from the Ground Up - School of Information Technology ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
4.5. Hybrid Combinations 1374.5.2 Benefits of classifier-based filtering for wordnet expansionThe MSR for <strong>the</strong> experiments and <strong>the</strong> values of all attributes were generated <strong>from</strong>two corpora combined – both were used in o<strong>the</strong>r our experiments. Their more detaileddescription can be found in Section 3.4.5. One was IPIC with ≈ 254 milliontoken. The o<strong>the</strong>r was <strong>the</strong> corpus of <strong>the</strong> daily Rzeczpospolita with ≈ 116 million token(Rzeczpospolita, 2008).MSR RW F was <strong>the</strong> same as that proposed by Piasecki et al. (2007b). Its constructionwas based only on two types of lexico-morphosyntactic constraints: modification bya specific adjective or adjectival participle (AdjC in Section 3.4.3, page 67), and coordinationwith a specific noun (NcC).All nouns, adjectives and adjectival participles <strong>from</strong> <strong>the</strong> combined corpora wereused accordingly as <strong>the</strong> lexical elements of constraint instances. MSR RW F provideda description of 13298 nominal LUs and achieved <strong>the</strong> accuracy of almost 91% inWBST+H, see Section 3.3.1 generated <strong>from</strong> <strong>the</strong> plWordNet version June 2008.We used plWordNet as <strong>the</strong> main source of training/test examples. Following <strong>the</strong>main line of <strong>the</strong> experimental paradigm of (Snow et al., 2005), we generated <strong>from</strong>plWordNet two sets of LU pairs: Known Hypernyms [KH] and Known Non-Hypernyms[NH]. Our goal is to support linguists by presenting relevant pairs of LUs. Similarlyto (Snow et al., 2005) we constructed <strong>the</strong> set of Known Hypernyms <strong>from</strong> LU pairs〈a, b〉 where b is a direct hypernyms of a or a hypernymic ancestor of a. In contrastwith (Snow et al., 2005), we allowed only for <strong>the</strong> limited hypernymic distances in allKH sets. Aiming at a tool to support linguists, we did not want remote associationsamong classified positively LU pairs.Hypernymy path length guided experiments with two different divisions of <strong>the</strong> twogroups. We wanted to investigate to what degree we can distinguish closer and moreremote hypernyms. We generated four data sets <strong>from</strong> <strong>the</strong> plWordNet version April2008:H <strong>the</strong> set of pairs: direct hypernym/hyponym (2967 pairs) – in all experiments H wasincluded in KH,P2 pairs of LUs connected by <strong>the</strong> path of <strong>the</strong> two arcs in <strong>the</strong> hypernymy graph –P2 was included in KH (2060 pairs),P3 pairs of LUs connected by a path of three or more hypernymy arcs, in NH (1176pairs),R pairs of words randomly selected <strong>from</strong> plWordNet in such way that no direct hypernymypath connects <strong>the</strong>m, NH (55366 pairs, including co-hyponyms).After initial experiments, we noticed that <strong>the</strong> border space between typical elementsof KH and NH is not populated well enough, especially considering its importance