06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

170 Chapter 5. Polish WordNet Today and Tomorrowthan for substantive lemmas. Never<strong>the</strong>less, valuable suggestions – of use to linguists– have been generated most of <strong>the</strong> new lemmas, including even those relatively rare.5.2 plWordNet at ThreeThe present state of plWordNet – version 1.0 – is <strong>the</strong> effect of <strong>the</strong> productive semiautomaticexpansion discussed in Section 4.5.3. We note that it was <strong>the</strong> linguists whohave introduced all new synsets and instances of lexico-semantic relations, followingautomatically generated suggestions.As discussed in Section 2.5, we chose to measure <strong>the</strong> size of plWordNet in lemmasand lexical units, but Table 5.1 4 shows synset numbers too. This facilitates <strong>the</strong>comparison with o<strong>the</strong>r wordnet descriptions in <strong>the</strong> literature.Nouns Verbs Adjectives AllLemmasAll 14131 3497 2636 20223Monosemous 10839 2777 1924 15477Polysemous 3292 720 712 4746LUs 18611 4498 3881 26990Synsets 13675 1860 2160 17695Table 5.1: The size of plWordNet, version 1.0Adverbial LUs have not been included in <strong>the</strong> first version of plWordNet. Instead,we increased <strong>the</strong> number of nominal and verbal LUs, with <strong>the</strong> strong emphasis on<strong>the</strong> former: <strong>the</strong>re are 1.54 times more nominal lemmas than verbal and adjectivallemmas toge<strong>the</strong>r. The corresponding ratio in PWN is still much higher (“WordNet3.0 database statistics” in (Miller et al., 2007)): <strong>the</strong>re are 3.57 times more nominal“strings” than verbal and adjectival ones toge<strong>the</strong>r. The data collected <strong>from</strong> <strong>the</strong> jointcorpus (Section 4.5.4), automatically processed by a morphosyntactic tagger, show<strong>the</strong> ratio: 1.45 nominal lemma (including several hundred multiword LUs <strong>from</strong> <strong>the</strong>list prepared for expanding plWordNet, cf Section 4.5.4) per one verbal or adjectivallemma. There is only a moderate nominal LU bias in plWordNet, compared to <strong>the</strong>state in PWN and <strong>the</strong> corpus.There are more LUs in plWordNet than synsets. That is because one LU belongsto exactly one synset but a synset can group several LUs (Table 5.1). Reporting <strong>the</strong>distinction between <strong>the</strong> monosemous and polysemous lemmas follows <strong>the</strong> practice ofPWN (Miller et al., 2007).4 The counts describe <strong>the</strong> state of plWordNet at <strong>the</strong> moment of writing <strong>the</strong> book. See plwordnet.pwr.wroc.pl for <strong>the</strong> up-to-date numbers.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!