24.07.2013 Views

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.1. <strong>Lexical</strong> Knowledge Bases 35<br />

OpenThesaurus.<strong>PT</strong> 13 (hereafter, OT.<strong>PT</strong>) is <strong>the</strong> Portuguese version <strong>of</strong> a collaborative<br />

<strong>the</strong>saurus initiative (Naber, 2004). It is approximately four times smaller<br />

than TeP. OT.<strong>PT</strong> contains 13,258 lexical items, organised in 4,102 synsets, but <strong>the</strong><br />

project has not had any significant development since 2006. This resource is mainly<br />

used in <strong>the</strong> OpenOffice 14 word processor for suggesting synonyms.<br />

Electronic dictionaries<br />

There are several Portuguese dictionaries available for online queries, however, we<br />

would like to mention two <strong>of</strong> <strong>the</strong>m which, besides containing some additional explicit<br />

semantic markups, are public domain and thus freely available for download and use.<br />

Wiktionary.<strong>PT</strong> 15 is a collaborative dictionary by <strong>the</strong> Wikimedia foundation<br />

where, besides <strong>the</strong> typical dictionary information, it is possible to add information<br />

on semantic relations for each entry. For Portuguese however, this resource is<br />

still small and, besides o<strong>the</strong>r problems, most entries do not have information about<br />

semantic relations. On May 2012, Wiktionary.<strong>PT</strong> contained almost 180,000 entries.<br />

However, as all Wiktionaries are multilingual, not all <strong>of</strong> those entries correspond to<br />

Portuguese words.<br />

Dicionário Aberto (hereafter DA, Simões and Farinha (2011); Simões et al.<br />

(2012)) is <strong>the</strong> electronic version <strong>of</strong> an old Portuguese dictionary from 1913, maintained<br />

by Alberto Simões in University <strong>of</strong> Minho. DA, whose orthography is currently<br />

being modernised, has 128,521 entries. Recently, some semantic relations,<br />

extracted using simple patterns, were added to <strong>the</strong> DA’s interface 16 , in a so called<br />

ontology view (Simões et al., 2012).<br />

<strong>Lexical</strong>-semantic network<br />

PAPEL (Gonçalo Oliveira et al., 2008, 2009, 2010b) is a public domain lexical resource<br />

with instances <strong>of</strong> several types <strong>of</strong> semantic relations, extracted automatically<br />

from Dicionário PRO da Língua Portuguesa (DLP, 2005), a Portuguese dictionary,<br />

property <strong>of</strong> Porto Editora. It was developed by Linguateca. The main differences<br />

between PAPEL and a wordnet is that PAPEL was created automatically and is<br />

not structured in synsets, nor sense-aware. PAPEL can be seen as a lexical network<br />

– it is structured in relational triples t = {w1, R, w2} denoting instances <strong>of</strong> semantic<br />

relations R, where w1 and w2 are lexical items, identified by <strong>the</strong>ir orthographical<br />

form. Its current version, PAPEL 3.0 17 , contains about 190,000 triples <strong>of</strong> different<br />

types, connecting about 100,000 unique lexical items.<br />

Portuguese LKBs in numbers<br />

Similarly to what we have done for <strong>the</strong> English LKBs, here, we put <strong>the</strong> Portuguese<br />

LKBs side-by-side. Table 3.3 characterises <strong>the</strong> LKBs according to <strong>the</strong>ir construction<br />

and availability. Table 3.4 shows <strong>the</strong> number <strong>of</strong> included lexical items, according<br />

to <strong>the</strong>ir POS. Especially due to its automatic construction, PAPEL is clearly <strong>the</strong><br />

13 Available from http://open<strong>the</strong>saurus.caixamagica.pt/ (August 2012)<br />

14 See http://www.open<strong>of</strong>fice.org/ (August 2012)<br />

15 Available from http://pt.wiktionary.org/ (August 2012)<br />

16 Available from http://www.dicionario-aberto.net/ (August 2012)<br />

17 Available from http://www.linguateca.pt/PAPEL/ (August 2012)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!