06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.5. Hybrid Combinations 1374.5.2 Benefits of classifier-based filtering for wordnet expansionThe MSR for <strong>the</strong> experiments and <strong>the</strong> values of all attributes were generated <strong>from</strong>two corpora combined – both were used in o<strong>the</strong>r our experiments. Their more detaileddescription can be found in Section 3.4.5. One was IPIC with ≈ 254 milliontoken. The o<strong>the</strong>r was <strong>the</strong> corpus of <strong>the</strong> daily Rzeczpospolita with ≈ 116 million token(Rzeczpospolita, 2008).MSR RW F was <strong>the</strong> same as that proposed by Piasecki et al. (2007b). Its constructionwas based only on two types of lexico-morphosyntactic constraints: modification bya specific adjective or adjectival participle (AdjC in Section 3.4.3, page 67), and coordinationwith a specific noun (NcC).All nouns, adjectives and adjectival participles <strong>from</strong> <strong>the</strong> combined corpora wereused accordingly as <strong>the</strong> lexical elements of constraint instances. MSR RW F provideda description of 13298 nominal LUs and achieved <strong>the</strong> accuracy of almost 91% inWBST+H, see Section 3.3.1 generated <strong>from</strong> <strong>the</strong> plWordNet version June 2008.We used plWordNet as <strong>the</strong> main source of training/test examples. Following <strong>the</strong>main line of <strong>the</strong> experimental paradigm of (Snow et al., 2005), we generated <strong>from</strong>plWordNet two sets of LU pairs: Known Hypernyms [KH] and Known Non-Hypernyms[NH]. Our goal is to support linguists by presenting relevant pairs of LUs. Similarlyto (Snow et al., 2005) we constructed <strong>the</strong> set of Known Hypernyms <strong>from</strong> LU pairs〈a, b〉 where b is a direct hypernyms of a or a hypernymic ancestor of a. In contrastwith (Snow et al., 2005), we allowed only for <strong>the</strong> limited hypernymic distances in allKH sets. Aiming at a tool to support linguists, we did not want remote associationsamong classified positively LU pairs.Hypernymy path length guided experiments with two different divisions of <strong>the</strong> twogroups. We wanted to investigate to what degree we can distinguish closer and moreremote hypernyms. We generated four data sets <strong>from</strong> <strong>the</strong> plWordNet version April2008:H <strong>the</strong> set of pairs: direct hypernym/hyponym (2967 pairs) – in all experiments H wasincluded in KH,P2 pairs of LUs connected by <strong>the</strong> path of <strong>the</strong> two arcs in <strong>the</strong> hypernymy graph –P2 was included in KH (2060 pairs),P3 pairs of LUs connected by a path of three or more hypernymy arcs, in NH (1176pairs),R pairs of words randomly selected <strong>from</strong> plWordNet in such way that no direct hypernymypath connects <strong>the</strong>m, NH (55366 pairs, including co-hyponyms).After initial experiments, we noticed that <strong>the</strong> border space between typical elementsof KH and NH is not populated well enough, especially considering its importance

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!