06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2.5. The Final State of plWordNet Core 45A common practice in describing <strong>the</strong> size of a wordnet is to report <strong>the</strong> numberof synsets it contains. This practice can be traced back to <strong>the</strong> original concept of PWNaccording to which a wordnet is an inventory of senses expressed by synsets. In ourapproach, LUs are <strong>the</strong> centrepiece of <strong>the</strong> wordnet. For many wordnet applications<strong>the</strong> numbers of lemmas (in <strong>the</strong> sense 11 introduced in Section 1.2) and <strong>the</strong> correspondingLUs (Section 2.1) are <strong>the</strong> most important characteristics of a lexical resource,representing its coverage and applicability. That is why we prefer to describe <strong>the</strong> sizeof <strong>the</strong> core plWordNet in lemmas and LUs, but for <strong>the</strong> sake of comparison we include<strong>the</strong> number of synsets, by part of speech. See Table 2.1.Including Monosemous Lemmas Excluding Monosemous LemmasNouns 1.405 2.586Verbs 1.281 2.406Adjectives 1.472 2.754Table 2.2:Average polysemy in <strong>the</strong> core plWordNetAround half of <strong>the</strong> lemmas employed in <strong>the</strong> construction of <strong>the</strong> core plWordNetwere selected <strong>from</strong> <strong>the</strong> list of 10000 most frequent lemmas in IPIC. The remainder isdue to <strong>the</strong> linguists’ additions required to complete certain synsets, and to <strong>the</strong> attemptsto translate <strong>the</strong> upper levels of PWN’s hypernymy structure for <strong>the</strong> three parts of speech.Percentage of lemmas belonging to <strong>the</strong> n synsets [%]1 2 3 4 5 6 7 8 9 ≥ 10Nouns 74.46 16.15 5.92 2.17 0.74 0.36 0.15 0.05 0.00 0.00Verbs 80.04 14.21 4.17 0.99 0.40 0.19 0.00 0.00 0.00 0.00Adjectives 73.10 15.74 6.61 2.71 0.99 0.08 0.27 0.15 0.15 0.20Table 2.3:The number of synsets to which a lemma belongs in <strong>the</strong> core plWordNetPercentage of synsets including <strong>the</strong> n lexical units [%]1 2 3 4 5 6 7 8 9 ≥ 10Nouns 65.91 19.50 7.91 3.82 1.38 0.62 0.36 0.13 0.17 0.20Verbs 21.45 39.09 22.11 9.55 4.41 1.93 0.48 0.36 0.18 0.44Adjectives 56.67 23.65 11.71 3.87 2.29 0.61 0.61 0.23 0.14 0.22Table 2.4:Sizes of synsets in <strong>the</strong> core plWordNetThe division of lemmas into monosemous and polysemous, fur<strong>the</strong>r illustrated in Table2.2, was inspired by a similar practice of PWN (Miller et al., 2007). The polysemystatistics illustrate <strong>the</strong> general character of lemmas included in <strong>the</strong> core.11 Technically, a lemma is <strong>the</strong> basic morphological form of a given word form, produced by morphosyntacticdisambiguation in context.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!