24.07.2013 Views

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

146 Chapter 8. <strong>Onto</strong>.<strong>PT</strong>: a lexical ontology for Portuguese<br />

acto, ...). Ano<strong>the</strong>r important difference regards <strong>the</strong> size and <strong>the</strong> correction <strong>of</strong> <strong>the</strong><br />

synsets <strong>of</strong> MWN.<strong>PT</strong> and <strong>Onto</strong>.<strong>PT</strong>. The former contains small synsets, <strong>of</strong>ten with<br />

only one word, while <strong>the</strong> latter, as referred earlier, contains large synsets. On <strong>the</strong><br />

o<strong>the</strong>r hand, given its manual revision, <strong>the</strong> MWN.<strong>PT</strong> synsets are supposedly all<br />

correct, while <strong>Onto</strong>.<strong>PT</strong>, due to its automatic construction, contains incorrections.<br />

8.4 Using <strong>Onto</strong>.<strong>PT</strong><br />

The main goal <strong>of</strong> creating <strong>Onto</strong>.<strong>PT</strong> is its exploitation in <strong>the</strong> achievement <strong>of</strong> tasks<br />

on <strong>the</strong> computational processing <strong>of</strong> Portuguese. As referred earlier in this chapter,<br />

this is also a popular approach to validate ontologies (Brank et al., 2005).<br />

In order to illustrate <strong>the</strong> utility <strong>of</strong> a resource as <strong>Onto</strong>.<strong>PT</strong>, in this section, we<br />

provide utilisation scenarios, where this resource can be seen as a valuable contribution.<br />

All <strong>the</strong> scenarios intend to be mere pro<strong>of</strong>s <strong>of</strong> concept. None <strong>of</strong> <strong>the</strong> used<br />

techniques are very sophisticated and we did not go fur<strong>the</strong>r on <strong>the</strong>ir evaluation. We<br />

start by presenting an exercise on exploring <strong>the</strong> taxonomy <strong>of</strong> <strong>Onto</strong>.<strong>PT</strong>. Then, we<br />

show how <strong>Onto</strong>.<strong>PT</strong> can be applied to word sense disambiguation (WSD). After that,<br />

we briefly describe how this resource was integrated in an information retrieval (IR)<br />

system, in order to enhance query expansion. The IR system was evaluated with<br />

<strong>the</strong> participation in an IR joint task. The last utilisation scenario is about taking<br />

advantage <strong>of</strong> <strong>Onto</strong>.<strong>PT</strong> to answer cloze questions automatically.<br />

8.4.1 Exploring <strong>the</strong> <strong>Onto</strong>.<strong>PT</strong> taxonomy<br />

The first usage example is a simple exploration exercise, showing that, besides providing<br />

synonyms for lexical items, <strong>Onto</strong>.<strong>PT</strong> can be queried to acquire taxonomic information,<br />

as well as o<strong>the</strong>r semantic information on <strong>the</strong> organisation <strong>of</strong> <strong>the</strong> lexicon.<br />

Figure 8.5 shows a four-level taxonomy obtained from <strong>Onto</strong>.<strong>PT</strong>, where cão (dog)<br />

is included. For <strong>the</strong> sake <strong>of</strong> simplicity, we omit some synset entries, as well as<br />

non-hypernym relations, from this figure.<br />

The taxonomy shows that <strong>Onto</strong>.<strong>PT</strong> can be used, for instance, to collect a list <strong>of</strong><br />

animals, a list <strong>of</strong> mammals, or a list <strong>of</strong> dog breeds. Starting with <strong>the</strong> most general<br />

level, with a synset denoting an animal, its is possible to obtain kinds <strong>of</strong> animals,<br />

including birds (ave), insects (insecto), and mammals (mamífero). Mammals can<br />

be expanded for obtaining mammal synsets, including cow (vaca), whale (baleia),<br />

cat (gato), or dog (cão). Finally, if <strong>the</strong> hypernyms <strong>of</strong> <strong>the</strong> dog synset are expanded,<br />

several dog breeds are shown, including boxer, mongrel (rafeiro) or dalmatian<br />

(dálmata).<br />

8.4.2 Word sense disambiguation<br />

There is a wide range <strong>of</strong> knowledge-based WSD algorithms, using a wordnet both<br />

as sense inventory and as an additional source <strong>of</strong> knowledge (e.g. Resnik (1995);<br />

Banerjee and Pedersen (2002); Agirre and Soroa (2009)). As <strong>Onto</strong>.<strong>PT</strong> is structured<br />

in a similar fashion to a wordnet, most <strong>of</strong> <strong>the</strong> previous algorithms may be adapted to<br />

use <strong>Onto</strong>.<strong>PT</strong> for performing Portuguese WSD. We have implemented two algorithms<br />

for this task: Bag-<strong>of</strong>-Words and Personalized PageRank (Agirre and Soroa, 2009).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!