06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

62 Chapter 3. Discovering Semantic RelatednessLUs manifest <strong>the</strong>mselves in text by occurrences of language expressions. That is whywe can also extend <strong>the</strong> application of this hypo<strong>the</strong>sis to <strong>the</strong> meaning of LUs.The Distributional Hypo<strong>the</strong>sis allows one to assess <strong>the</strong> commonalities between LUmeanings by measuring <strong>the</strong> similarity of contexts in which <strong>the</strong>y occur (via languageexpressions). Grammatical relations are recognised in text mostly exact to some degreeof accuracy, so in <strong>the</strong> general case we should ra<strong>the</strong>r talk about measuring <strong>the</strong> strengthof semantic relatedness between LUs – not <strong>the</strong> exact semantic relations between <strong>the</strong>m.The semantic relatedness that is a correlate of <strong>the</strong> likelihood that two LUs can occurin <strong>the</strong> same type of contexts.A Measure of Semantic Relatedness [MSR], briefly discussed in Section 3.2, isa function that assigns a real value to <strong>the</strong> semantic relatedness of two LUs by comparing<strong>the</strong> descriptions of <strong>the</strong>ir distribution across different contexts in <strong>the</strong> corpus.High recall is an intrinsic property of an MSR. An MSR finds a value of <strong>the</strong> strengthof relatedness for almost any pair of LUs. Moreover, for a given LU x and a largeenough value of k one can expect many LUs related to x by one of <strong>the</strong> PWN relationsamong k LUs most semantically related to x – henceforth, we will denote this set ofLUs by MSRlist (x,k) . In practice, however, we mostly see a low accuracy of MSRsmeasured as <strong>the</strong> cut-off precision of <strong>the</strong> MSRlist (x,k) list calculated in comparison torelation instances extracted <strong>from</strong> a wordnet (Section 3.3) for a fixed value of k, suchas 20. See <strong>the</strong> result of <strong>the</strong> experiments later in this section. Never<strong>the</strong>less, despite<strong>the</strong> expected problems with accuracy, but due to <strong>the</strong> expected high recall, our goalfor <strong>the</strong> first step of constructing tools for semi-automatic expansion of plWordNet wasto build an MSR for Polish with a relatively high accuracy with respect to <strong>the</strong> coreplWordNet. We planned to achieve this by working with a very large corpus – toincrease <strong>the</strong> number of examples of LU use – and by using rich description of contextsbased on <strong>the</strong> analysis of morphosyntactic dependencies among LU occurrences. Weexpected to extract an MSR more focused on semantic similarity, which returns a largepercentage of LUs associated with x by synonymy or hypernymy among MSRlist (x,k)for some LU x and a small value of k. The idea is to let <strong>the</strong> linguist browse <strong>the</strong> wholeMSRlist (x,k) comfortably. Preliminary experiments also suggested that linguists mightnot accept less than 50% of correct instances of lexico-semantic relations on <strong>the</strong> listof suggestions.3.4.2 Context and its descriptionThe construction of an MSR requires two decisions first: on <strong>the</strong> context size (orgranularity of <strong>the</strong> LU meaning description) and on <strong>the</strong> types of constraints used ascontext description. The decisions are correlated. For example, with context thatexceeds sentence boundaries, <strong>the</strong> description cannot be based only on lexico-syntacticrelations (most syntactic relations do not hold outside a sentence). Two main lines ofwork emerge in <strong>the</strong> literature – MSR extraction based on:

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!