06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

172 Chapter 5. Polish WordNet Today and TomorrowPolysemy rates for plWordNet in comparison to <strong>the</strong> rates of PWN 3.0 appear inTable 5.3. The adjectival part has not been semi-automatically expanded and represents<strong>the</strong> state <strong>from</strong> <strong>the</strong> core plWordNet. Both adjectival polysemy rates in plWordNet aresimilar to those of PWN, but it is hard to draw any general conclusions: plWordNet isso much smaller than PWN, and it only underwent a partial semi-automatic expansion.Average number of LUs per one synsetCzech WN Estonian WN French WN German WN plWordNetNouns 1.42 1.64 1.37 1.37 1.36Verbs 1.98 2.12 1.69 1.31 2.42Table 5.4: The average number of LUs per synset in plWordNet and four EWN wordnets (second phase)(Vossen et al., 1999, p. 7).The definition of a synset in plWordNet, based on linguistic criteria (Section 2.1),may – on <strong>the</strong> face of it – lead to very small synsets, most of <strong>the</strong>m with just one LU.Encouragingly, <strong>the</strong>n, <strong>the</strong> average number of LUs per synset in <strong>the</strong> nominal part ofplWordNet is 1.36. This number is very close to those obtained for <strong>the</strong> three largestEWN wordnets – see Table 5.4. The ratio for verbal synsets in plWordNet is ra<strong>the</strong>rhigh, but that part has been expanded in a small degree. We can expect a decrease in<strong>the</strong> ratio of <strong>the</strong> verbal LUs per synset during later stages of semi-automatic expansion.For <strong>the</strong> nominal part of plWordNet, <strong>the</strong> ratio decreased <strong>from</strong> 3.13 in version 12.2006and 1.36 in <strong>the</strong> version 1.0.Percentage of synsets including <strong>the</strong> n LUs [%]1 2 3 4 5 6 7 8 9 ≥ 10Nouns 79.50 12.67 4.22 1.84 0.91 0.34 0.20 0.12 0.07 0.13Verbs 27.31 36.34 19.89 8.39 4.35 1.67 0.97 0.43 0.27 0.38Adjectives 56.90 23.33 11.76 3.84 2.31 0.65 0.60 0.23 0.14 0.24Table 5.5: Synset sizes.In Table 5.5 we take a closer look at <strong>the</strong> distribution of <strong>the</strong> synset sizes over <strong>the</strong>three parts of plWordNet. Each number in <strong>the</strong> table shows what percentage of synsetsbelonging to <strong>the</strong> given part – nominal, verbal, adjectival – is such that <strong>the</strong>y include <strong>the</strong>particular number of LUs. For <strong>the</strong> nominal part, <strong>the</strong> majority of synsets are singletons,but <strong>the</strong>re is a significant percentage of 2- and 3-element synsets. The numbers graduallydecrease with <strong>the</strong> increasing n. As a results of plWordNet expansion, many new nonsingletonsynsets were created or complemented with new LUs.The largest existing nominal synset includes emotionally marked nominal LUs:{głupiec ‘fool’, głąb ‘noodle’, baran ‘blockhead’, osioł‘donkey’, gamoń ‘nincompoop’, matoł ‘nitwit’, ćwierćinteligent

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!