06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

5.2. plWordNet at Three 171The total number of LUs described in plWordNet 1.0 is a little above <strong>the</strong> rangedeclared in <strong>the</strong> project proposal (15–25 thousand LUs). It compares quite favourablywith <strong>the</strong> wordnets created during <strong>the</strong> second phase of <strong>the</strong> EuroWordNet [EWN] project(Vossen et al., 1999, p. 7). We selected <strong>the</strong> older versions of <strong>the</strong> correspondingwordnets: <strong>the</strong>y were created <strong>from</strong> scratch (like plWordNet), and <strong>the</strong> EWN 2.0 projectlasted several years (like plWordNet). The comparison must be taken with a grain ofsalt, because both <strong>the</strong> EWN project and <strong>the</strong> plWordNet project had also goals o<strong>the</strong>rthan <strong>the</strong> construction of a wordnet (consider <strong>the</strong> alignment of wordnets in <strong>the</strong> former,and automatic methods in <strong>the</strong> latter). Still, it is more appropriate than a comparison towordnets developed over a much longer period. The relation between plWordNet and<strong>the</strong> contemporary wordnets is briefly described at <strong>the</strong> end of this section.Czech WN Estonian WN French WN German WNSynsets LUs Synsets LUs Synsets LUs Synsets LUsNouns 9727 13829 5028 8226 17826 24499 9951 13656Verbs 3097 6120 2650 5613 4919 8310 5166 6778All 12824 19949 7678 13839 22745 32809 15132 20453Table 5.2: Selected counts for Czech, Estonian and German wordnets built in <strong>the</strong> second phase of <strong>the</strong>EuroWordNet project (Vossen et al., 1999, p. 7)Facts about <strong>the</strong> Czech, Estonian, French and German wordnets, built in <strong>the</strong> secondphase of <strong>the</strong> EWN project, appear in Table 5.2. The nominal part of plWordNet islarger than in <strong>the</strong> Estonian wordnet, similar in size to <strong>the</strong> Czech and German wordnets,and significantly smaller only than <strong>the</strong> French wordnet, whose construction was basedon an extensive translation of PWN. The verbal part of plWordNet is smaller thanin any of <strong>the</strong> EWN wordnets. The reason is that <strong>the</strong> semi-automatic expansion wasperformed mainly on <strong>the</strong> nominal part. In 1.51 person-months, we added 60% lemmasand 40% LUs; every element was added and verified by two linguists. The EWNwordnets did not, in practice, include LUs o<strong>the</strong>r than nominal and verbal units, whileplWordNet contains 3881 adjectival LUs. It should be emphasized, however, that allEWN wordnets are aligned with PWN 1.5 (via a subset of synsets used as a form ofinter-lingua). Alignment with PWN was not a research goal in <strong>the</strong> plWordNet project.Including Monosemous Lemmas Excluding Monosemous LemmasplWordNet PWN 3.0 plWordNet PWN 3.0Nouns 1.317 1.24 2.361 2.79Verbs 1.286 2.17 2.390 3.57Adjectives 1.472 1.40 2.749 2.71Table 5.3: Average polysemy in plWordNet and PWN 3.0

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!