24.07.2013 Views

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 5<br />

Synset Discovery<br />

As referred in <strong>the</strong> previous chapter, a LKB structured in words, instead <strong>of</strong> concepts,<br />

does not handle lexical ambiguity and might lead to serious inconsistencies. To<br />

deal with that issue, wordnets are structured in synsets, which are groups <strong>of</strong> words<br />

sharing a common meaning and thus representing a concept. This chapter is about<br />

<strong>the</strong> discovery <strong>of</strong> synsets from a term-based LKB, which is <strong>the</strong> first step for moving<br />

towards a sense-aware resource.<br />

Since a synset groups words according to <strong>the</strong>ir synonymy, in this step, we only<br />

use <strong>the</strong> network established by <strong>the</strong> synonymy triples extracted from dictionaries.<br />

On <strong>the</strong> one hand, co-occurrence graphs extracted from corpora have shown to be<br />

useful for identifying not only synonymous words, but also word senses (Dorow,<br />

2006). It should be mentioned that, in opposition to o<strong>the</strong>r kinds <strong>of</strong> relation, synonymous<br />

words share similar neighbourhoods, but may not co-occur frequently<br />

in corpora text (Dorow, 2006), which leads to few textual patterns connecting<br />

this kind <strong>of</strong> words. So, as referred in section 3.2.2, most <strong>of</strong> <strong>the</strong> works on synonymy<br />

(or near-synonymy) extraction from corpora rely on <strong>the</strong> application <strong>of</strong> ma<strong>the</strong>matical<br />

models (e.g. Turney (2001)), including graphs, clustering algorithms, or<br />

both (e.g. Dorow (2006)). On <strong>the</strong> o<strong>the</strong>r hand, in synonymy networks extracted<br />

from dictionaries, clusters tend to express concepts (Gfeller et al., 2005) and can<br />

<strong>the</strong>refore be exploited for <strong>the</strong> establishement <strong>of</strong> synsets. Methods for improving <strong>the</strong><br />

organisation <strong>of</strong> synonymy graphs, extracted from different resources, are presented<br />

by Navarro et al. (2009).<br />

As o<strong>the</strong>r authors noticed for PAPEL (Prestes et al., 2011), we confirmed that<br />

synonymy networks extracted from dictionaries connect more than half <strong>of</strong> <strong>the</strong> words<br />

by, at least, one path. Therefore, as o<strong>the</strong>rs did for discovering new concepts from<br />

text (e.g. Lin and Pantel (2002)), we used a (graph) clustering algorithm on our<br />

synonymy networks. This kind <strong>of</strong> work is related to WSD. More specifically, it<br />

can be seen as word sense induction (WSI, Navigli (2012)) as it discovers possible<br />

concepts <strong>of</strong> a word, without exploiting an existing sense inventory.<br />

As discussed in section 4.3, from a linguistic point <strong>of</strong> view, word senses are<br />

not discrete, so <strong>the</strong>ir representation as crisp objects does not reflect <strong>the</strong> human<br />

language. A more realistic approach for coping with this fact is to represent synsets<br />

as models <strong>of</strong> uncertainty, such as fuzzy sets, to handle word senses and natural<br />

language concepts. Our clustering algorithm can be used for <strong>the</strong> discovery <strong>of</strong> fuzzy<br />

synsets. The fuzzy membership <strong>of</strong> a word in a synset can be interpreted as <strong>the</strong><br />

confidence level about using this word to indicate <strong>the</strong> meaning <strong>of</strong> <strong>the</strong> synset.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!