Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
146 Chapter 8. <strong>Onto</strong>.<strong>PT</strong>: a lexical ontology for Portuguese<br />
acto, ...). Ano<strong>the</strong>r important difference regards <strong>the</strong> size and <strong>the</strong> correction <strong>of</strong> <strong>the</strong><br />
synsets <strong>of</strong> MWN.<strong>PT</strong> and <strong>Onto</strong>.<strong>PT</strong>. The former contains small synsets, <strong>of</strong>ten with<br />
only one word, while <strong>the</strong> latter, as referred earlier, contains large synsets. On <strong>the</strong><br />
o<strong>the</strong>r hand, given its manual revision, <strong>the</strong> MWN.<strong>PT</strong> synsets are supposedly all<br />
correct, while <strong>Onto</strong>.<strong>PT</strong>, due to its automatic construction, contains incorrections.<br />
8.4 Using <strong>Onto</strong>.<strong>PT</strong><br />
The main goal <strong>of</strong> creating <strong>Onto</strong>.<strong>PT</strong> is its exploitation in <strong>the</strong> achievement <strong>of</strong> tasks<br />
on <strong>the</strong> computational processing <strong>of</strong> Portuguese. As referred earlier in this chapter,<br />
this is also a popular approach to validate ontologies (Brank et al., 2005).<br />
In order to illustrate <strong>the</strong> utility <strong>of</strong> a resource as <strong>Onto</strong>.<strong>PT</strong>, in this section, we<br />
provide utilisation scenarios, where this resource can be seen as a valuable contribution.<br />
All <strong>the</strong> scenarios intend to be mere pro<strong>of</strong>s <strong>of</strong> concept. None <strong>of</strong> <strong>the</strong> used<br />
techniques are very sophisticated and we did not go fur<strong>the</strong>r on <strong>the</strong>ir evaluation. We<br />
start by presenting an exercise on exploring <strong>the</strong> taxonomy <strong>of</strong> <strong>Onto</strong>.<strong>PT</strong>. Then, we<br />
show how <strong>Onto</strong>.<strong>PT</strong> can be applied to word sense disambiguation (WSD). After that,<br />
we briefly describe how this resource was integrated in an information retrieval (IR)<br />
system, in order to enhance query expansion. The IR system was evaluated with<br />
<strong>the</strong> participation in an IR joint task. The last utilisation scenario is about taking<br />
advantage <strong>of</strong> <strong>Onto</strong>.<strong>PT</strong> to answer cloze questions automatically.<br />
8.4.1 Exploring <strong>the</strong> <strong>Onto</strong>.<strong>PT</strong> taxonomy<br />
The first usage example is a simple exploration exercise, showing that, besides providing<br />
synonyms for lexical items, <strong>Onto</strong>.<strong>PT</strong> can be queried to acquire taxonomic information,<br />
as well as o<strong>the</strong>r semantic information on <strong>the</strong> organisation <strong>of</strong> <strong>the</strong> lexicon.<br />
Figure 8.5 shows a four-level taxonomy obtained from <strong>Onto</strong>.<strong>PT</strong>, where cão (dog)<br />
is included. For <strong>the</strong> sake <strong>of</strong> simplicity, we omit some synset entries, as well as<br />
non-hypernym relations, from this figure.<br />
The taxonomy shows that <strong>Onto</strong>.<strong>PT</strong> can be used, for instance, to collect a list <strong>of</strong><br />
animals, a list <strong>of</strong> mammals, or a list <strong>of</strong> dog breeds. Starting with <strong>the</strong> most general<br />
level, with a synset denoting an animal, its is possible to obtain kinds <strong>of</strong> animals,<br />
including birds (ave), insects (insecto), and mammals (mamífero). Mammals can<br />
be expanded for obtaining mammal synsets, including cow (vaca), whale (baleia),<br />
cat (gato), or dog (cão). Finally, if <strong>the</strong> hypernyms <strong>of</strong> <strong>the</strong> dog synset are expanded,<br />
several dog breeds are shown, including boxer, mongrel (rafeiro) or dalmatian<br />
(dálmata).<br />
8.4.2 Word sense disambiguation<br />
There is a wide range <strong>of</strong> knowledge-based WSD algorithms, using a wordnet both<br />
as sense inventory and as an additional source <strong>of</strong> knowledge (e.g. Resnik (1995);<br />
Banerjee and Pedersen (2002); Agirre and Soroa (2009)). As <strong>Onto</strong>.<strong>PT</strong> is structured<br />
in a similar fashion to a wordnet, most <strong>of</strong> <strong>the</strong> previous algorithms may be adapted to<br />
use <strong>Onto</strong>.<strong>PT</strong> for performing Portuguese WSD. We have implemented two algorithms<br />
for this task: Bag-<strong>of</strong>-Words and Personalized PageRank (Agirre and Soroa, 2009).