06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

5.3. Lessons Learned 179We have tested many methods of extracting lexico-semantic relations (Chapters 3and 4). None of <strong>the</strong>m ensures quality comparable to manual work. While <strong>the</strong> accuracywas often good, <strong>the</strong> problem was to find <strong>the</strong> limits. For example, Measures of SemanticRelatedness [MSR] produce continuous results. Defined for any pair of lemmas, <strong>the</strong>extraction based on lexico-syntactic patterns produces LU pairs that represents differentshades of some semantic relation. It is very hard to construct a general automaticmechanism that defines <strong>the</strong> border between those potential relation instances which arecorrelated with <strong>the</strong> linguist’s judgment, and those which are not. It becomes easierwhen we consider automatic expansion of an existing wordnet. The linguistic knowledgealready represented by <strong>the</strong> wordnet structures helps increase <strong>the</strong> trustworthinessof <strong>the</strong> automated additions. Consider <strong>the</strong> promising results of <strong>the</strong> WordNet Weaverapplication (Section 4.5.4).A core plWordNet should contain <strong>the</strong> upper levels of <strong>the</strong> hypernymy hierarchy, butit is very hard to construct it top-down without compromising <strong>the</strong> linguistic nature of<strong>the</strong> lexical network: one can unwittingly “slip” into an abstract ontology (taxonomy).More general LUs have few true hypernyms, and it is difficult to distinguish between<strong>the</strong>ir direct and indirect hyponyms. Bottom-up work might be safer, but it too hasa drawback: <strong>the</strong> proper selection 7 of <strong>the</strong> more specific LUs in order to “activate” a widerange of more general LUs at <strong>the</strong> end of <strong>the</strong> process of core plWordNet construction.The problem is to make <strong>the</strong> selection in such a way that we can get an exhaustive setof <strong>the</strong> most general LUs if we just keep describing hypernyms of <strong>the</strong> specific LUs. Ourexperience suggests strong preference for <strong>the</strong> bottom-up approach. It worked especiallywell during <strong>the</strong> semi-automatic expansion phase.Our experiments with <strong>the</strong> WordNet Weaver [WNW], a tool for semi-automaticwordnet expansion (Sections 4.5.3-4.5.4), were generally encouraging. 8361 new lemmas,10537 new LUs, 8729 synsets and 11063 instances of lexico-semantic relationshave been added to <strong>the</strong> core plWordNet at <strong>the</strong> cost of 3.4 person-months. Every decisionassisted by WNW was verified by a coordinator, and many improvements weremade to <strong>the</strong> initial plWordNet structure.It is hard to separate <strong>the</strong> time spent on correcting <strong>the</strong> core plWordNet <strong>from</strong> <strong>the</strong> timespent on expanding it. WNW allowed us to discover many errors in <strong>the</strong> core plWordNetstructure: trying to attach new lemmas to <strong>the</strong> existing structure often brought out <strong>the</strong>drawbacks of that structure. WNW’s suggestions are less helpful for gerunds (it is anopen problem how to reconcile gerund description by both MSRs and by pattern). Thepercentage of usefully suggested attachment varies across domains.We initially assumed a model of first generating part of <strong>the</strong> plWordNet structureand <strong>the</strong>n correcting it semi-automatically. That did not work well at all. A linguistcould be lost if required to make hundreds of corrections in a continuously evolvingwordnet structure. It seems better to correct <strong>the</strong> proposed attachments one by one.7 Such selection is necessarily constrained, not <strong>the</strong> least by financial considerations.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!