06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.5. Hybrid Combinations 147Snow et al. (2006) evaluated manually <strong>the</strong> first n automatically added hypernymiclinks. Because n was up to 20000 in <strong>the</strong> last experiment, only randomly selectedsamples were assessed. The applied uniform size of <strong>the</strong> samples equal to 100 forn > 1000 was too small to ascribe <strong>the</strong> results of <strong>the</strong> evaluation to <strong>the</strong> whole setswith sufficient statistical confidence. Among different types of evaluation performed,<strong>the</strong> fine-grained one seems to be <strong>the</strong> most interesting <strong>from</strong> our point of view. Theevaluators were asked: “is X a Y?”, where 〈X, Y 〉 is an added link. It is not clearin (Snow et al., 2006) whe<strong>the</strong>r only direct hyponym/hypernyms counted as positive.For each pair of nouns 〈i, j〉, where i is unknown, <strong>the</strong> algorithm finally selects onlyone sense of j, so only <strong>the</strong> best hit is added or evaluated. According to this setting offine-grained evaluation, <strong>the</strong> achieved precision of 84% for n = 10000 is high, but maybe hard to compare with o<strong>the</strong>r approaches, including ours (to follow soon), because itis given only for <strong>the</strong> best hit and <strong>the</strong> basic criterion (cited above) is not precise.Alfonseca and Manandhar (2002) and Witschel (2005) use only one knowledgesource and work locally on <strong>the</strong> hypernymy tree. Each decision is based on <strong>the</strong> propertiesof <strong>the</strong> currently processed node. Widdows (2003) considers a broader context of severalpoints in <strong>the</strong> structure but also uses only one type of evidence. Caraballo (1999)combined two types of information, patterns and MSR, but <strong>the</strong> extracted structure seemsto be too far <strong>from</strong> <strong>the</strong> proper hypernymy structure. Snow et al. (2006) combine twoknowledge sources and utilise not only vertical structure but also horizontal structureof cousins (direct and indirect co-hyponyms). The assumption was that <strong>the</strong> resultsof all classifiers can be described probabilistically – not <strong>the</strong> case for lexico-syntacticpatterns. One of <strong>the</strong>ir classifiers is based on processing a corpus by parsing, a step notfeasible for many natural languages.How To Combine Extraction AlgorithmsIn <strong>the</strong> previous sections we have reported on several methods of extracting lexicosemanticrelations [LSR] for Polish. None of <strong>the</strong>m individually has reached <strong>the</strong> accuracylevel required in a support tool for linguists. We will now investigate combinationsof <strong>the</strong> following methods:• a measure of semantic relatedness based on <strong>the</strong> Rank Weight Function, written asMSR RW F , developed for Polish nouns and presented in (Piasecki et al., 2007a)– MSR RW F extracts closely semantically related LUs with high accuracy, but<strong>the</strong> extracted LU pairs belong to a range of LSRs, not only to <strong>the</strong> typical wordnetrelations;• post-filtering LU pairs produced by MSR RW F with a classifier presented in <strong>the</strong>Section 4.5.1 called here C H – <strong>the</strong> percentage of LSR instances increases among<strong>the</strong> filtered pairs;

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!