Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
3.1. <strong>Lexical</strong> Knowledge Bases 35<br />
OpenThesaurus.<strong>PT</strong> 13 (hereafter, OT.<strong>PT</strong>) is <strong>the</strong> Portuguese version <strong>of</strong> a collaborative<br />
<strong>the</strong>saurus initiative (Naber, 2004). It is approximately four times smaller<br />
than TeP. OT.<strong>PT</strong> contains 13,258 lexical items, organised in 4,102 synsets, but <strong>the</strong><br />
project has not had any significant development since 2006. This resource is mainly<br />
used in <strong>the</strong> OpenOffice 14 word processor for suggesting synonyms.<br />
Electronic dictionaries<br />
There are several Portuguese dictionaries available for online queries, however, we<br />
would like to mention two <strong>of</strong> <strong>the</strong>m which, besides containing some additional explicit<br />
semantic markups, are public domain and thus freely available for download and use.<br />
Wiktionary.<strong>PT</strong> 15 is a collaborative dictionary by <strong>the</strong> Wikimedia foundation<br />
where, besides <strong>the</strong> typical dictionary information, it is possible to add information<br />
on semantic relations for each entry. For Portuguese however, this resource is<br />
still small and, besides o<strong>the</strong>r problems, most entries do not have information about<br />
semantic relations. On May 2012, Wiktionary.<strong>PT</strong> contained almost 180,000 entries.<br />
However, as all Wiktionaries are multilingual, not all <strong>of</strong> those entries correspond to<br />
Portuguese words.<br />
Dicionário Aberto (hereafter DA, Simões and Farinha (2011); Simões et al.<br />
(2012)) is <strong>the</strong> electronic version <strong>of</strong> an old Portuguese dictionary from 1913, maintained<br />
by Alberto Simões in University <strong>of</strong> Minho. DA, whose orthography is currently<br />
being modernised, has 128,521 entries. Recently, some semantic relations,<br />
extracted using simple patterns, were added to <strong>the</strong> DA’s interface 16 , in a so called<br />
ontology view (Simões et al., 2012).<br />
<strong>Lexical</strong>-semantic network<br />
PAPEL (Gonçalo Oliveira et al., 2008, 2009, 2010b) is a public domain lexical resource<br />
with instances <strong>of</strong> several types <strong>of</strong> semantic relations, extracted automatically<br />
from Dicionário PRO da Língua Portuguesa (DLP, 2005), a Portuguese dictionary,<br />
property <strong>of</strong> Porto Editora. It was developed by Linguateca. The main differences<br />
between PAPEL and a wordnet is that PAPEL was created automatically and is<br />
not structured in synsets, nor sense-aware. PAPEL can be seen as a lexical network<br />
– it is structured in relational triples t = {w1, R, w2} denoting instances <strong>of</strong> semantic<br />
relations R, where w1 and w2 are lexical items, identified by <strong>the</strong>ir orthographical<br />
form. Its current version, PAPEL 3.0 17 , contains about 190,000 triples <strong>of</strong> different<br />
types, connecting about 100,000 unique lexical items.<br />
Portuguese LKBs in numbers<br />
Similarly to what we have done for <strong>the</strong> English LKBs, here, we put <strong>the</strong> Portuguese<br />
LKBs side-by-side. Table 3.3 characterises <strong>the</strong> LKBs according to <strong>the</strong>ir construction<br />
and availability. Table 3.4 shows <strong>the</strong> number <strong>of</strong> included lexical items, according<br />
to <strong>the</strong>ir POS. Especially due to its automatic construction, PAPEL is clearly <strong>the</strong><br />
13 Available from http://open<strong>the</strong>saurus.caixamagica.pt/ (August 2012)<br />
14 See http://www.open<strong>of</strong>fice.org/ (August 2012)<br />
15 Available from http://pt.wiktionary.org/ (August 2012)<br />
16 Available from http://www.dicionario-aberto.net/ (August 2012)<br />
17 Available from http://www.linguateca.pt/PAPEL/ (August 2012)