Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
8.1. Overview 133<br />
items inside a synset are ordered according to <strong>the</strong> frequency each one <strong>of</strong> <strong>the</strong>m is used<br />
to denote <strong>the</strong> sense corresponding to <strong>the</strong> meaning <strong>of</strong> <strong>the</strong> synset. This information is<br />
based on <strong>the</strong> annotations <strong>of</strong> SemCor (Miller et al., 1994), a sense annotated corpus.<br />
8.1.1 Underlying lexical network<br />
After <strong>the</strong> manual evaluation <strong>of</strong> <strong>the</strong> semantic relation extraction (see section 4.2.5),<br />
we identified a few problems in this step. Besides minor changes in <strong>the</strong> grammars,<br />
some lemmatisation rules were refined and some filters were added to avoid relations<br />
between lexical items such as:<br />
• cf, used several times in <strong>the</strong> middle <strong>of</strong> DA definitions for introducing bibliographic<br />
references;<br />
• transitivo, intransitivo, reflexivo, and o<strong>the</strong>r verb classifying words, incorrectly<br />
extracted from Wiktionary as synonyms <strong>of</strong> verbs;<br />
• synonymy between verbs in <strong>the</strong> gerund, <strong>of</strong>ten incorrect because <strong>the</strong> verb in <strong>the</strong><br />
gerund refers to an action that specifies <strong>the</strong> previous verb (e.g. in estender,<br />
puxando and o<strong>the</strong>r examples in section 6.4.3).<br />
These corrections resulted in CARTÃO 3.1, a new version <strong>of</strong> this resource, after<br />
augmentation with:<br />
• Antonymy relations from TeP 2.0, which comprise 4,276 sb-triples – 1,407 between<br />
nouns, 1,158 between verbs, 1,562 between adjectives and 149 between<br />
adverbs. Given that <strong>the</strong> final <strong>Onto</strong>.<strong>PT</strong> synsets are not exactly <strong>the</strong> same as<br />
in TeP, <strong>the</strong> former antonymy relations were converted to tb-triples. For this<br />
purpose, each sb-triple resulted in several antonymy tb-triples, each one connecting<br />
one lexical item from <strong>the</strong> synset in <strong>the</strong> first argument with an item<br />
from <strong>the</strong> synset in <strong>the</strong> second argument.<br />
• Synsets from OpenThesaurus.<strong>PT</strong>, more precisely, those we could identify <strong>the</strong><br />
POS, which comprise 3,925 synsets – 1,971 nouns, 831 verbs, 1,079 adjectives<br />
and 44 adverbs. As TeP was our synset-base, <strong>the</strong> former relations were converted<br />
to tb-triples, whose arguments would later be added to TeP synsets.<br />
For this purpose, each synset resulted in several synonymy tb-triples, each one<br />
connecting two different lexical items in <strong>the</strong> synset.<br />
Finally, <strong>the</strong> tb-triples connecting two lexical items not occurring in<br />
CETEMPúblico or in TeP were discarded, unless <strong>the</strong>y were extracted from more<br />
than one resource. This can be seen as a first approach to eliminate very unfrequent<br />
and probably unuseful words from <strong>Onto</strong>.<strong>PT</strong>. Table 8.1 shows <strong>the</strong> distribution <strong>of</strong> <strong>the</strong><br />
tb-triples in <strong>the</strong> lexical network used to create <strong>Onto</strong>.<strong>PT</strong> v.0.35.<br />
8.1.2 Synsets<br />
We recall that <strong>the</strong> synsets <strong>of</strong> TeP 2.0 were used as a starting point for creating<br />
<strong>the</strong> <strong>Onto</strong>.<strong>PT</strong> synset-base. The assignment algorithm described in section 6.1.2 was<br />
used to enrich TeP with <strong>the</strong> synpairs <strong>of</strong> CARTÃO, after <strong>the</strong> second resource was<br />
augmented, as referred in <strong>the</strong> previous section. Following <strong>the</strong> experimentation described<br />
in section 6.2, we decided to use <strong>the</strong> cosine similarity measure, with mode