06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.5. Sense Discovery by Clustering 87possibility of constructing an MSR significantly better in this respect. Semantic andpragmatic constraints make many LUs semantically related to many o<strong>the</strong>r LUs andMRSs based on distributional semantics generate a continuum of relatedness valuesfor pairs of LUs. <strong>Wordnet</strong> relations appear as just weakly identifiable characteristicsubspaces in <strong>the</strong> continuum of semantic relatedness. We need an additional way ofselecting those LU pairs <strong>from</strong> <strong>the</strong> MSRlist (x,k) lists which represent particular wordnetrelations. Two ways appear to emerge:1. application of lexico-syntactic patterns (Sections 3.2 and 4) as an additionalsource of knowledge,2. introduction of an additional classifier trained on <strong>the</strong> plWordNet data and used forfiltering out MSRlist (x,k) pairs which are not instances of any wordnet relation(Section 4.5.1).We also mentioned briefly in <strong>the</strong> discussion of verbal and adjectival MSRs <strong>the</strong>idea of changing <strong>the</strong> perspective <strong>from</strong> automatic extraction of sets of instances of <strong>the</strong>wordnet relations to expanding <strong>the</strong> existing wordnet with new lemmas anchored inexisting synsets. This can significantly extend <strong>the</strong> amount of knowledge available andreduce <strong>the</strong> complexity of <strong>the</strong> problem. We will present in Section 4.5.3 a solutionfollowing this idea. Let us emphasise here that it is automated wordnet expansionwhich was our assumed goal, not automatic wordnet construction <strong>from</strong> scratch.3.5 Sense Discovery by ClusteringThe synset is one of <strong>the</strong> most fundamental building blocks of <strong>the</strong> wordnet structure.An algorithm for automatic extraction of synsets would be very helpful for linguistswho build a wordnet up manually (though usually with substantial software support).Clustering groups objects on a hyperplane so as to minimise <strong>the</strong> distance betweenobjects inside a group and maximise <strong>the</strong> distance between objects <strong>from</strong> different groups.A definition of distance, or similarity, between objects is required for such grouping.For clustering of lemmas into synset-like groups, we could use directly a Measure ofSemantic Relatedness [MSR] (Section 3.4). A drawback would be that MSRs tend tomerge different lemma senses in one vector that represents <strong>the</strong> meaning of lemmas,or to over-represent one predominant sense of a given lemma (Piasecki et al., 2007a).That is why we need a clustering method aware of ambiguity in lemma meaning.The Most Frequent Sense heuristic states that in one genre or domain one senseof a given lemma is dominant (Agirre and Edmonds, 2006). Without <strong>the</strong>maticallylabelled corpora one can hope that clustering techniques make it possible to achieveapproximation of domains, because documents are grouped by similarity. On <strong>the</strong> o<strong>the</strong>r

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!