24.07.2013 Views

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

5.4. Discussion 93<br />

discovered synsets result in CLIP, a Portuguese <strong>the</strong>saurus, larger than public domain<br />

Portuguese handcrafted <strong>the</strong>sauri. CLIP was compared with those <strong>the</strong>sauri, which<br />

lead us to <strong>the</strong> conclusion that we can obtain an even larger <strong>the</strong>saurus if we integrate<br />

all <strong>the</strong>sauri. Given that OT.<strong>PT</strong>, <strong>the</strong> smaller <strong>the</strong>saurus used in our experimentation,<br />

is currently used for suggesting synonyms in OpenOffice 3 writer, CLIP can be seen as<br />

a larger alternative for <strong>the</strong> same purpose. Still, since size is not <strong>the</strong> only important<br />

property <strong>of</strong> a <strong>the</strong>saurus, it is always possible to create a smaller <strong>the</strong>sauri, after<br />

filtering less common words.<br />

The proposed algorithm may be used for <strong>the</strong> creation <strong>of</strong> a fuzzy <strong>the</strong>saurus, where<br />

words have membership degrees to each synset. Having in mind that word senses<br />

are not discrete, representing natural language concepts as fuzzy synsets is closer to<br />

reality than using simple synsets. Moreover, a fuzzy <strong>the</strong>saurus is a useful resource<br />

for NLP. For instance, in WSD, choosing <strong>the</strong> synset where <strong>the</strong> target word has higher<br />

membership might be used as a baseline. As far as we know, <strong>the</strong> fuzzy version <strong>of</strong><br />

our <strong>the</strong>saurus is <strong>the</strong> first ever Portuguese <strong>the</strong>saurus with fuzzy memberships.<br />

The presented approach has however shown some limitations. For instance, a<br />

fixed cut-point is probably not <strong>the</strong> best option while moving from a fuzzy <strong>the</strong>saurus<br />

to a <strong>the</strong>saurus without fuzzy memberships. Therefore, we have recently added <strong>the</strong><br />

possibility <strong>of</strong> having a variable cut-point, relative to <strong>the</strong> highest membership in <strong>the</strong><br />

set. Possibly <strong>the</strong> main limitation <strong>of</strong> our approach is that synsets are not created<br />

when word senses do not have dictionary entries with synonyms. This is something<br />

we will have to deal in <strong>the</strong> future. Finally, <strong>the</strong> manual evaluation showed interesting<br />

but not optimal results (75% accuracy), which indicates that <strong>the</strong>re is still room for<br />

improvement.<br />

In any case, as we will discuss in <strong>the</strong> following chapters, CLIP could be used<br />

as <strong>the</strong> synset-base for a future wordnet-like resource. But TeP is a similar public<br />

alternative. And as it is created manually by experts, we have high confidence on its<br />

contents, so, we decided to use it as <strong>the</strong> starting point <strong>of</strong> our synset-base. The next<br />

chapter describes how TeP can be enriched with synonymy information extracted<br />

from dictionaries, in order to have a broader <strong>the</strong>saurus. In order to discover new<br />

synsets, only <strong>the</strong> synpairs not added to TeP are <strong>the</strong> target <strong>of</strong> a clustering algorithm,<br />

similar to <strong>the</strong> one presented here.<br />

3 See http://www.open<strong>of</strong>fice.org/ (September 2012)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!