Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
5.4. Discussion 93<br />
discovered synsets result in CLIP, a Portuguese <strong>the</strong>saurus, larger than public domain<br />
Portuguese handcrafted <strong>the</strong>sauri. CLIP was compared with those <strong>the</strong>sauri, which<br />
lead us to <strong>the</strong> conclusion that we can obtain an even larger <strong>the</strong>saurus if we integrate<br />
all <strong>the</strong>sauri. Given that OT.<strong>PT</strong>, <strong>the</strong> smaller <strong>the</strong>saurus used in our experimentation,<br />
is currently used for suggesting synonyms in OpenOffice 3 writer, CLIP can be seen as<br />
a larger alternative for <strong>the</strong> same purpose. Still, since size is not <strong>the</strong> only important<br />
property <strong>of</strong> a <strong>the</strong>saurus, it is always possible to create a smaller <strong>the</strong>sauri, after<br />
filtering less common words.<br />
The proposed algorithm may be used for <strong>the</strong> creation <strong>of</strong> a fuzzy <strong>the</strong>saurus, where<br />
words have membership degrees to each synset. Having in mind that word senses<br />
are not discrete, representing natural language concepts as fuzzy synsets is closer to<br />
reality than using simple synsets. Moreover, a fuzzy <strong>the</strong>saurus is a useful resource<br />
for NLP. For instance, in WSD, choosing <strong>the</strong> synset where <strong>the</strong> target word has higher<br />
membership might be used as a baseline. As far as we know, <strong>the</strong> fuzzy version <strong>of</strong><br />
our <strong>the</strong>saurus is <strong>the</strong> first ever Portuguese <strong>the</strong>saurus with fuzzy memberships.<br />
The presented approach has however shown some limitations. For instance, a<br />
fixed cut-point is probably not <strong>the</strong> best option while moving from a fuzzy <strong>the</strong>saurus<br />
to a <strong>the</strong>saurus without fuzzy memberships. Therefore, we have recently added <strong>the</strong><br />
possibility <strong>of</strong> having a variable cut-point, relative to <strong>the</strong> highest membership in <strong>the</strong><br />
set. Possibly <strong>the</strong> main limitation <strong>of</strong> our approach is that synsets are not created<br />
when word senses do not have dictionary entries with synonyms. This is something<br />
we will have to deal in <strong>the</strong> future. Finally, <strong>the</strong> manual evaluation showed interesting<br />
but not optimal results (75% accuracy), which indicates that <strong>the</strong>re is still room for<br />
improvement.<br />
In any case, as we will discuss in <strong>the</strong> following chapters, CLIP could be used<br />
as <strong>the</strong> synset-base for a future wordnet-like resource. But TeP is a similar public<br />
alternative. And as it is created manually by experts, we have high confidence on its<br />
contents, so, we decided to use it as <strong>the</strong> starting point <strong>of</strong> our synset-base. The next<br />
chapter describes how TeP can be enriched with synonymy information extracted<br />
from dictionaries, in order to have a broader <strong>the</strong>saurus. In order to discover new<br />
synsets, only <strong>the</strong> synpairs not added to TeP are <strong>the</strong> target <strong>of</strong> a clustering algorithm,<br />
similar to <strong>the</strong> one presented here.<br />
3 See http://www.open<strong>of</strong>fice.org/ (September 2012)