24.07.2013 Views

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

1.4. Outline <strong>of</strong> <strong>the</strong> <strong>the</strong>sis 7<br />

which can be seen as ano<strong>the</strong>r contribution <strong>of</strong> this <strong>the</strong>sis, enabled us to use <strong>the</strong> same<br />

grammars for extracting information from three dictionaries.<br />

Toge<strong>the</strong>r with <strong>Onto</strong>.<strong>PT</strong>, o<strong>the</strong>r lexical-semantic resources were developed by using<br />

each <strong>of</strong> <strong>the</strong> ECO steps independently. These resources, listed below, contribute for<br />

advancing <strong>the</strong> state-<strong>of</strong>-<strong>the</strong>-art <strong>of</strong> Portuguese LKBs:<br />

• CARTÃO, <strong>the</strong> largest term-based lexical-semantic network for Portuguese;<br />

• CLIP, as far as we know, <strong>the</strong> first broad-coverage fuzzy <strong>the</strong>saurus for Portuguese;<br />

• TRIP, <strong>the</strong> largest public synset-based <strong>the</strong>saurus for Portuguese.<br />

We should add that most <strong>of</strong> <strong>the</strong> work performed during <strong>the</strong> course <strong>of</strong> this <strong>the</strong>sis is<br />

reported in a total <strong>of</strong> 14 scientific papers, presented and/or published in national and<br />

international venues, such as IJCAI, ECAI, EPIA or NLDB (see more in section 9.1).<br />

1.4 Outline <strong>of</strong> <strong>the</strong> <strong>the</strong>sis<br />

After two chapters on background knowledge and related work, each chapter <strong>of</strong> this<br />

<strong>the</strong>sis is focused on an automatic procedure that integrates <strong>the</strong> ECO approach and<br />

performs one step towards our final goal, <strong>the</strong> creation <strong>of</strong> <strong>Onto</strong>.<strong>PT</strong>. Besides describing<br />

each procedure, one or more experiments towards its validation are reported in each<br />

chapter. Before concluding, <strong>the</strong> last version <strong>of</strong> <strong>Onto</strong>.<strong>PT</strong>, available while this <strong>the</strong>sis<br />

was written, is presented toge<strong>the</strong>r with examples <strong>of</strong> scenarios where it might be<br />

useful. In <strong>the</strong> end <strong>of</strong> <strong>the</strong> <strong>the</strong>sis, two appendices were included, namely: (A), with an<br />

extensive list <strong>of</strong> <strong>the</strong> semantic relations in <strong>Onto</strong>.<strong>PT</strong> and <strong>the</strong>ir description; and (B),<br />

which shows a rough manual mapping between <strong>the</strong> <strong>Onto</strong>.<strong>PT</strong> synsets and <strong>the</strong> core<br />

concepts <strong>of</strong> Princeton WordNet. We now describe each chapter, briefly:<br />

Chapter 2 introduces (mostly) <strong>the</strong>oretical background knowledge that supports<br />

this research. It starts with some remarks on lexical semantics, <strong>the</strong> subfield<br />

<strong>of</strong> semantics that deals with words and meanings. Then, different formalisms for<br />

representing lexical-semantic computational resources are described. Given that<br />

our work is related to <strong>the</strong> NLP field <strong>of</strong> information extraction, <strong>the</strong> last section is<br />

dedicated to this topic.<br />

Chapter 3 is about concrete work with some relation to our research. First, it<br />

describes well-known lexical knowledge bases for English, and also for Portuguese.<br />

Second, it presents work on information extraction from dictionaries and also from<br />

corpora. Third, work on <strong>the</strong> automatic enrichment or integration <strong>of</strong> existing knowledge<br />

bases is referred.<br />

Chapter 4 explains our work towards <strong>the</strong> acquisition <strong>of</strong> semantic relations<br />

from dictionaries. As many regularities are kept across definitions in different<br />

dictionaries, we reuse existing handcrafted grammars, made for <strong>the</strong> extraction <strong>of</strong><br />

semantic relations from one dictionary, for extracting relations from o<strong>the</strong>r dictionaries.<br />

This results in CARTÃO, a large lexical network for Portuguese that<br />

integrates knowledge from three different dictionaries.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!