Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Chapter 6<br />
Thesaurus Enrichment<br />
General language dictionaries and language <strong>the</strong>sauri cover <strong>the</strong> same kind <strong>of</strong> knowledge,<br />
but represent it differently. While <strong>the</strong> former consist <strong>of</strong> lists <strong>of</strong> word senses<br />
and respective natural language sense descriptions, <strong>the</strong> latter group synonymous<br />
words toge<strong>the</strong>r, so that <strong>the</strong>y can be seen as possible lexicalisations <strong>of</strong> concepts.<br />
WordNet (Fellbaum, 1998) can actually be seen as a resource that bridges <strong>the</strong> gap<br />
between both kinds <strong>of</strong> resources, because each synset contains a textual gloss.<br />
However, in previous chapters, we have shown that, even though <strong>the</strong>y intend to<br />
cover <strong>the</strong> same kind <strong>of</strong> knowledge, most <strong>of</strong> <strong>the</strong> information in public handcrafted<br />
Portuguese <strong>the</strong>saurus is complementary to <strong>the</strong> information extracted from dictionaries.<br />
Therefore, it should be more fruitful to integrate <strong>the</strong>ir information in <strong>Onto</strong>.<strong>PT</strong><br />
instead <strong>of</strong> using <strong>the</strong>m merely as a reference for comparison. Ano<strong>the</strong>r aspect in favour<br />
<strong>of</strong> this option is that, besides its size, TeP was manually created by experts. This<br />
means that, more than integrating <strong>the</strong> information in TeP, we can take advantage<br />
<strong>of</strong> its structure to have more reliable synsets and more controlled sense granularity.<br />
The work presented in this chapter can be seen both as an alternative or a<br />
complement <strong>of</strong> <strong>the</strong> previous chapter, as we use <strong>the</strong> synsets <strong>of</strong> TeP as a starting<br />
point for <strong>the</strong> construction <strong>of</strong> a broader <strong>the</strong>saurus. To this end, we follow a fourstep<br />
approach for enriching an existing electronic <strong>the</strong>saurus, structured in synsets,<br />
with information extracted from electronic dictionaries, represented as synonymy<br />
pairs (synpairs) 1 :<br />
1. Extraction <strong>of</strong> synpairs from dictionary definitions;<br />
2. Assignment <strong>of</strong> synpairs to suitable synsets <strong>of</strong> <strong>the</strong> <strong>the</strong>saurus;<br />
3. Discovery <strong>of</strong> new synsets after clustering <strong>the</strong> remaining synpairs;<br />
4. Integration <strong>of</strong> <strong>the</strong> new synsets in <strong>the</strong> <strong>the</strong>saurus.<br />
In step 1, any approach for <strong>the</strong> automatic acquisition <strong>of</strong> synpairs from dictionaries,<br />
such as <strong>the</strong> one described in chapter 4, may be followed. Therefore, we will not<br />
go fur<strong>the</strong>r on this step. We start this chapter by presenting its main contribution,<br />
which is <strong>the</strong> algorithm for <strong>the</strong> automatic assignment <strong>of</strong> synpairs to synsets. Then,<br />
we evaluate <strong>the</strong> algorithm against a gold standard and select <strong>the</strong> most adequate<br />
settings for using it in <strong>the</strong> enrichment <strong>of</strong> TeP. Any graph clustering procedure suits<br />
step 3 <strong>of</strong> our approach. We chose to follow an approach similar to <strong>the</strong> one introduced<br />
1 Synpairs are synonymy tb-triples. They can be extracted from several sources, however, as we<br />
are dealing with general language knowledge, dictionaries are <strong>the</strong> obvious targets.