24.07.2013 Views

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

1.2. Approach 5<br />

• They are not built for Portuguese from scratch, and thus have to deal with<br />

translation issues, and include problems as lexical gaps;<br />

• They do not handle word senses, which might lead to inconsistencies regarding<br />

lexical ambiguity.<br />

Looking at this scenario, we set our goal to <strong>the</strong> development <strong>of</strong> computational<br />

tools for acquiring, structuring and integrating lexical-semantic<br />

knowledge from text. Although some <strong>of</strong> <strong>the</strong>se tools can be used independently,<br />

<strong>the</strong>ir development had in mind <strong>the</strong> exploitation <strong>of</strong> Portuguese resources and <strong>the</strong> aim<br />

<strong>of</strong> creating a new lexical ontology for Portuguese, where <strong>the</strong> aforementioned<br />

limitations were minimised. Consequently, <strong>the</strong> resulting resource would be:<br />

• Public domain and thus free for being used by anyone, both in a research<br />

or in a commercial setting. We believe this is <strong>the</strong> best way for <strong>the</strong> resource to<br />

play its role in helping to advance <strong>the</strong> state-<strong>of</strong>-<strong>the</strong>-art <strong>of</strong> Portuguese NLP. Fur<strong>the</strong>rmore,<br />

a bigger community <strong>of</strong> users tends to provide important feedback,<br />

useful for improving <strong>the</strong> resource.<br />

• Created automatically, which would be done by exploiting textual resources<br />

and o<strong>the</strong>r public LKBs, all created from scratch for one or more variants<br />

<strong>of</strong> Portuguese. An automatic construction enables <strong>the</strong> creation <strong>of</strong> larger and<br />

broader resources, in a trade-<strong>of</strong>f for lower reliability, but still acceptable for<br />

most tasks.<br />

• Structured according to <strong>the</strong> wordnet model. This option relied on <strong>the</strong> great<br />

acceptance <strong>of</strong> this model and on <strong>the</strong> wide range <strong>of</strong> algorithms that work over<br />

this kind <strong>of</strong> structure to achieve various NLP tasks.<br />

1.2 Approach<br />

Our flexible approach for <strong>the</strong> acquisition, organisation and integration <strong>of</strong><br />

lexical-semantic knowledge involves three main automatic steps. Each step is<br />

independent <strong>of</strong> each o<strong>the</strong>r and can be used for <strong>the</strong> achievement <strong>of</strong> simpler tasks. Alternatively,<br />

<strong>the</strong>ir combination enables <strong>the</strong> integration <strong>of</strong> lexical-semantic knowledge<br />

from different heterogeneous sources and results in a wordnet-like ontology. The<br />

three steps are briefly described as follows:<br />

1. Extraction: instances <strong>of</strong> semantic relations, held between lexical items, are<br />

automatically extracted from text. As long as <strong>the</strong> extracted instances are<br />

represented as triples (two items connected by a predicate), <strong>the</strong> extraction<br />

techniques used in this step do not affect <strong>the</strong> following steps. In <strong>the</strong> specific<br />

case <strong>of</strong> our work, we followed a pattern based extraction on dictionary<br />

definitions.<br />

2. Thesaurus enrichment and clustering: if <strong>the</strong>re is a conceptual base with<br />

synsets for <strong>the</strong> target language, its synsets are augmented with <strong>the</strong> extracted<br />

synonymy relations. For this purpose, <strong>the</strong> network established by all extracted<br />

synonymy instances (synpairs) is exploited for computing <strong>the</strong> similarities between<br />

each synset and synpair. Both elements <strong>of</strong> a synpair are <strong>the</strong>n added to<br />

<strong>the</strong>ir most similar synset. As for synpairs with two lexical items not covered

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!