06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.5. Hybrid Combinations 149than some upper-bound distance d h . The distance is simply <strong>the</strong> number of hypernymylinks to traverse <strong>from</strong> t. The evidence for <strong>the</strong> surroundings is treated as less reliablethat for LUs in <strong>the</strong> central synset t, <strong>from</strong> <strong>the</strong> perspective of considering t as <strong>the</strong> pointof attachment. Any information that describes relations of a new lemma with LUs insynsets o<strong>the</strong>r than t is related to t only indirectly, by wordnet links. The weight of <strong>the</strong>context evidence decreases in proportion to <strong>the</strong> distance.We consider several sources of heterogeneous evidence for a potential relation ofa new lemma with a LU already in <strong>the</strong> wordnet and thus a relation with some synset.The results of all extraction methods were transformed to sets of LU pairs 〈x, y〉 suchthat x and y are semantically related according to <strong>the</strong> given method and <strong>the</strong> corporaanalysed. There are three groups of sets:• two sets produced using MSR RW F – <strong>the</strong> list MSRlist(y, k) of <strong>the</strong> k units mostrelated to y, and that list restricted to bidirectional relations:MSR BiDir (y, k) = {y ′ : y ′ ∈ MSRlist(y, k) ∧ y ∈ MSRlist(y ′ , k)};• one set generated by <strong>the</strong> classifier C H applied to filtering MSRlist(y, k) <strong>from</strong>LUs not in hypernymy, meronymy or synonymy with x; C H was trained on <strong>the</strong>data <strong>from</strong> plWordNet;• three sets produced by <strong>the</strong> manually constructed lexico-syntactic patterns andone set generated by <strong>the</strong> patterns produced by Estratto.There is only partial overlap among <strong>the</strong> sources, so we will use <strong>the</strong>m all in expanding<strong>the</strong> wordnet. We assume that <strong>the</strong> subsequent methods explore different pieces ofpartial information available in corpora. We assume, too, that <strong>the</strong> application of manydifferent methods allows <strong>the</strong> use of as much lexico-semantic information as possible.Different sources are differently reliable; this can be estimated e.g. by manual evaluationof <strong>the</strong> accuracy of <strong>the</strong> extracted pairs. We want to trust <strong>the</strong> different sources toa different degree: we introduce mechanisms of weighted voting.The algorithm of Activation-area AttachmentThe algorithm is based on <strong>the</strong> idea of a semantic fit: between two lemmas, as representingtwo LUs linked by a LSR, and between a lemma and a synset, as defininga LU. The fit is identified <strong>from</strong> all evidence found in corpora. Next, we group synsetswhich fit <strong>the</strong> input lemma into activation areas, <strong>from</strong> which <strong>the</strong> attachment areas areselected and returned. The attachment areas represent LUs which may have differentsenses of <strong>the</strong> given new lemma; <strong>the</strong> senses are identified <strong>from</strong> <strong>the</strong> input data deliveredto <strong>the</strong> algorithm.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!