24.07.2013 Views

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

8.1. Overview 133<br />

items inside a synset are ordered according to <strong>the</strong> frequency each one <strong>of</strong> <strong>the</strong>m is used<br />

to denote <strong>the</strong> sense corresponding to <strong>the</strong> meaning <strong>of</strong> <strong>the</strong> synset. This information is<br />

based on <strong>the</strong> annotations <strong>of</strong> SemCor (Miller et al., 1994), a sense annotated corpus.<br />

8.1.1 Underlying lexical network<br />

After <strong>the</strong> manual evaluation <strong>of</strong> <strong>the</strong> semantic relation extraction (see section 4.2.5),<br />

we identified a few problems in this step. Besides minor changes in <strong>the</strong> grammars,<br />

some lemmatisation rules were refined and some filters were added to avoid relations<br />

between lexical items such as:<br />

• cf, used several times in <strong>the</strong> middle <strong>of</strong> DA definitions for introducing bibliographic<br />

references;<br />

• transitivo, intransitivo, reflexivo, and o<strong>the</strong>r verb classifying words, incorrectly<br />

extracted from Wiktionary as synonyms <strong>of</strong> verbs;<br />

• synonymy between verbs in <strong>the</strong> gerund, <strong>of</strong>ten incorrect because <strong>the</strong> verb in <strong>the</strong><br />

gerund refers to an action that specifies <strong>the</strong> previous verb (e.g. in estender,<br />

puxando and o<strong>the</strong>r examples in section 6.4.3).<br />

These corrections resulted in CARTÃO 3.1, a new version <strong>of</strong> this resource, after<br />

augmentation with:<br />

• Antonymy relations from TeP 2.0, which comprise 4,276 sb-triples – 1,407 between<br />

nouns, 1,158 between verbs, 1,562 between adjectives and 149 between<br />

adverbs. Given that <strong>the</strong> final <strong>Onto</strong>.<strong>PT</strong> synsets are not exactly <strong>the</strong> same as<br />

in TeP, <strong>the</strong> former antonymy relations were converted to tb-triples. For this<br />

purpose, each sb-triple resulted in several antonymy tb-triples, each one connecting<br />

one lexical item from <strong>the</strong> synset in <strong>the</strong> first argument with an item<br />

from <strong>the</strong> synset in <strong>the</strong> second argument.<br />

• Synsets from OpenThesaurus.<strong>PT</strong>, more precisely, those we could identify <strong>the</strong><br />

POS, which comprise 3,925 synsets – 1,971 nouns, 831 verbs, 1,079 adjectives<br />

and 44 adverbs. As TeP was our synset-base, <strong>the</strong> former relations were converted<br />

to tb-triples, whose arguments would later be added to TeP synsets.<br />

For this purpose, each synset resulted in several synonymy tb-triples, each one<br />

connecting two different lexical items in <strong>the</strong> synset.<br />

Finally, <strong>the</strong> tb-triples connecting two lexical items not occurring in<br />

CETEMPúblico or in TeP were discarded, unless <strong>the</strong>y were extracted from more<br />

than one resource. This can be seen as a first approach to eliminate very unfrequent<br />

and probably unuseful words from <strong>Onto</strong>.<strong>PT</strong>. Table 8.1 shows <strong>the</strong> distribution <strong>of</strong> <strong>the</strong><br />

tb-triples in <strong>the</strong> lexical network used to create <strong>Onto</strong>.<strong>PT</strong> v.0.35.<br />

8.1.2 Synsets<br />

We recall that <strong>the</strong> synsets <strong>of</strong> TeP 2.0 were used as a starting point for creating<br />

<strong>the</strong> <strong>Onto</strong>.<strong>PT</strong> synset-base. The assignment algorithm described in section 6.1.2 was<br />

used to enrich TeP with <strong>the</strong> synpairs <strong>of</strong> CARTÃO, after <strong>the</strong> second resource was<br />

augmented, as referred in <strong>the</strong> previous section. Following <strong>the</strong> experimentation described<br />

in section 6.2, we decided to use <strong>the</strong> cosine similarity measure, with mode

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!