Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Chapter 5<br />
Synset Discovery<br />
As referred in <strong>the</strong> previous chapter, a LKB structured in words, instead <strong>of</strong> concepts,<br />
does not handle lexical ambiguity and might lead to serious inconsistencies. To<br />
deal with that issue, wordnets are structured in synsets, which are groups <strong>of</strong> words<br />
sharing a common meaning and thus representing a concept. This chapter is about<br />
<strong>the</strong> discovery <strong>of</strong> synsets from a term-based LKB, which is <strong>the</strong> first step for moving<br />
towards a sense-aware resource.<br />
Since a synset groups words according to <strong>the</strong>ir synonymy, in this step, we only<br />
use <strong>the</strong> network established by <strong>the</strong> synonymy triples extracted from dictionaries.<br />
On <strong>the</strong> one hand, co-occurrence graphs extracted from corpora have shown to be<br />
useful for identifying not only synonymous words, but also word senses (Dorow,<br />
2006). It should be mentioned that, in opposition to o<strong>the</strong>r kinds <strong>of</strong> relation, synonymous<br />
words share similar neighbourhoods, but may not co-occur frequently<br />
in corpora text (Dorow, 2006), which leads to few textual patterns connecting<br />
this kind <strong>of</strong> words. So, as referred in section 3.2.2, most <strong>of</strong> <strong>the</strong> works on synonymy<br />
(or near-synonymy) extraction from corpora rely on <strong>the</strong> application <strong>of</strong> ma<strong>the</strong>matical<br />
models (e.g. Turney (2001)), including graphs, clustering algorithms, or<br />
both (e.g. Dorow (2006)). On <strong>the</strong> o<strong>the</strong>r hand, in synonymy networks extracted<br />
from dictionaries, clusters tend to express concepts (Gfeller et al., 2005) and can<br />
<strong>the</strong>refore be exploited for <strong>the</strong> establishement <strong>of</strong> synsets. Methods for improving <strong>the</strong><br />
organisation <strong>of</strong> synonymy graphs, extracted from different resources, are presented<br />
by Navarro et al. (2009).<br />
As o<strong>the</strong>r authors noticed for PAPEL (Prestes et al., 2011), we confirmed that<br />
synonymy networks extracted from dictionaries connect more than half <strong>of</strong> <strong>the</strong> words<br />
by, at least, one path. Therefore, as o<strong>the</strong>rs did for discovering new concepts from<br />
text (e.g. Lin and Pantel (2002)), we used a (graph) clustering algorithm on our<br />
synonymy networks. This kind <strong>of</strong> work is related to WSD. More specifically, it<br />
can be seen as word sense induction (WSI, Navigli (2012)) as it discovers possible<br />
concepts <strong>of</strong> a word, without exploiting an existing sense inventory.<br />
As discussed in section 4.3, from a linguistic point <strong>of</strong> view, word senses are<br />
not discrete, so <strong>the</strong>ir representation as crisp objects does not reflect <strong>the</strong> human<br />
language. A more realistic approach for coping with this fact is to represent synsets<br />
as models <strong>of</strong> uncertainty, such as fuzzy sets, to handle word senses and natural<br />
language concepts. Our clustering algorithm can be used for <strong>the</strong> discovery <strong>of</strong> fuzzy<br />
synsets. The fuzzy membership <strong>of</strong> a word in a synset can be interpreted as <strong>the</strong><br />
confidence level about using this word to indicate <strong>the</strong> meaning <strong>of</strong> <strong>the</strong> synset.