06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.5. Hybrid Combinations 155suggested fruit and vegetable names (morela ‘apricot’ or pietruszka ‘parsnip’), namesof spices (tymianek ‘thyme’) or alcohols (rum ‘rum’) as direct or indirect hyponymsof <strong>the</strong> appropriate existing nodes (food, plant, spice, alcohol).Graph reconstruction did not always work so well. Occasionally, it resulted insuggested link between nouns that were unexpected (papużka ‘budgerigar’ → chomik‘hamster’ or delfin ‘dolphin’ → rybka ‘little fish’) or even random. The system wasless helpful for large of general domains such as person or place names – misses weremore frequent in such cases 16 . Sometimes LUs were linked quite accurately, but notby hyponymy/hypernymy or synonymy. Instead, <strong>the</strong> relation was ei<strong>the</strong>r meronymy (forexample, among nouns denoting body parts), fuzzynymy or (less often) relatedness.Let us note that WNW sometimes also served as a tool for discovering errors in<strong>the</strong> wordnet. For example, <strong>the</strong> unit rój ‘hive, colony’ was inappropriately linked with<strong>the</strong> synset grupa ludzi ‘a group of people’. The hypernymy tree was missing a nodefor a group of animals, present in <strong>the</strong> database but not linked by hyponymy/hypernymywith o<strong>the</strong>r LUs <strong>from</strong> <strong>the</strong> same semantic field. Similarly, <strong>the</strong> mislinked LU holiday (withhyponyms Saturday and Sunday) pointed to poorly arranged relations in <strong>the</strong> synset day:workday (with hyponyms Monday through Friday) complements holiday. Incompletehyponym/hypernym trees were also identified. For example, <strong>the</strong> mislinked new lemmaKoran ‘The Koran’ showed <strong>the</strong> absence in plWordNet of <strong>the</strong> hypernym (holy scriptures)as well as co-hyponyms (The Bible). A completely haphazard placement of <strong>the</strong> unitpartykuła ‘particle’ helped uncover <strong>the</strong> absence of part of speech, noun, verb, adjective,and so on.ExamplesFigures 4.9–4.16 show examples of WNW’s suggestions. The very accurate attachmentpoint (not only <strong>the</strong> area) suggested for nikiel ‘nickel’ exemplifies WNW’s very goodperformance in such domains like chemical substances and elements, plants or animals.Those are domains of a taxonomical character, but are also well described by <strong>the</strong>patterns run on <strong>the</strong> joint corpus. We show in WNW all synsets <strong>from</strong> a connectedsubgraph that received a positive score in relation to <strong>the</strong> subject LU, but only <strong>the</strong> localmaximum (a synset marked by <strong>the</strong> blue border) is <strong>the</strong> final singular attachment point.The whole subgraph represents <strong>the</strong> attachment area.16 This subjective observation comes <strong>from</strong> <strong>the</strong> linguist who worked with WNW. It has been partiallycontradicted by <strong>the</strong> statistical data recorded in <strong>the</strong> database that registered <strong>the</strong> linguist’s every decision.The data show that <strong>the</strong> most hits at <strong>the</strong> level of new lemmas was observed in <strong>the</strong> domain of persons, so<strong>the</strong> observation may have been caused by <strong>the</strong> lower number of direct hits in this domain. This exampleis a good illustration of possible discrepancies between objective statistically-based evaluation and <strong>the</strong>usability of a tool for <strong>the</strong> users.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!