06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

1.3. Early Decisions 19Merge Model: <strong>the</strong> selection is done in a local resource and <strong>the</strong> synsetsand <strong>the</strong>ir language-internal relations are first developed separately, afterwhich <strong>the</strong> equivalence relations to WordNet 1.5 are generated.Expand Model: <strong>the</strong> selection is done in WordNet 1.5 and <strong>the</strong> Word-Net 1.5 synsets are translated (using bilingual dictionaries) into equivalentsynsets in <strong>the</strong> o<strong>the</strong>r language. The wordnet relations are taken overand where necessary adapted to EuroWordNet. Possibly, monolingual resourcesare used to verify <strong>the</strong> wordnet relations imposed on non-Englishsynsets.It has been observed that <strong>the</strong> expand model can lead to a wordnet biased byWordNet 1.5. For many languages, however, ei<strong>the</strong>r no electronic monolingual resources– extended monolingual dictionaries or <strong>the</strong>sauri – are available, or existing resourcesare small, often with limited information in <strong>the</strong>ir entries. There have been suggestionsthat for such languages <strong>the</strong> expand model can work well in wordnet development.In <strong>the</strong> scope of EWN, <strong>the</strong> expand model was adopted for <strong>the</strong> Spanish and Frenchwordnets. Later several o<strong>the</strong>r wordnet development projects also adopted it, including<strong>the</strong> Croatian WordNet (Raffaelli et al., 2008) and Hungarian WordNet (Miháltz et al.,2008).A wordnet constructed following <strong>the</strong> merge model should provide a description oflexico-semantic relations closer to <strong>the</strong> spirit of <strong>the</strong> given language, in that it is lessinfluenced by <strong>the</strong> design decisions in a wordnet for ano<strong>the</strong>r language (probably English),often of a significantly different type. The merge model, however, requires richresources at <strong>the</strong> outset, for example, a monolingual dictionary with senses identified,detailed definitions, <strong>the</strong>matic codes for senses and some semantic structuring. Suchresources are created for humans readers, so to construct a wordnet <strong>from</strong> <strong>the</strong>m is morethan merely a matter of copying 10 – see (Pedersen and Nimb, 2008) for <strong>the</strong> use ofresources in <strong>the</strong> DanNet project. The difference is also clear when one compares PWNand LDCE (Bullon et al., 2003), or plWordNet and (Dubisz, 2004).1.3.2 Why we chose <strong>the</strong> merge approachNo electronic dictionary on which we could base <strong>the</strong> construction of Polish wordnetwas available 11 . In addition, we did not want to consider indiscriminate mapping ofPWN, and we dismissed <strong>the</strong> idea of translating it into Polish. In effect, we decidedto build plWordNet <strong>from</strong> scratch. On <strong>the</strong> o<strong>the</strong>r hand, we wanted to keep plWordNet10 If a dictionary contains rich information structured in a way that facilitates NLP, we face ano<strong>the</strong>rquestion: is <strong>the</strong> wordnet <strong>the</strong> best way of describing lexical semantics for NLP? We have no experience toanswer such a question, because <strong>the</strong> Polish language, unfortunately, is not blessed with such abundance.11 The existing Polish electronic dictionaries, for example (Dubisz, 2004) or (PWN, 2007), are notfreely available for research, and in any event <strong>the</strong>ir structure makes <strong>the</strong>ir usefulness limited.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!