24.07.2013 Views

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

34 Chapter 3. Related Work<br />

<strong>the</strong> ELRA catalog 9 .<br />

In MWN.<strong>PT</strong>’s documentation 10 , its authors refer that <strong>the</strong> first version <strong>of</strong> this<br />

resource spans over 17,200 manually validated synsets, which correspond to 21,000<br />

word senses/word forms and 16,000 lemmas, from both European and Brazilian<br />

variants <strong>of</strong> Portuguese. The MWN.<strong>PT</strong> synsets are aligned with <strong>the</strong> translation<br />

equivalent concepts <strong>of</strong> Princeton WordNet and, transitively, to <strong>the</strong> MultiWordNets<br />

<strong>of</strong> Italian, Spanish, Hebrew, Romanian and Latin.<br />

MWN.<strong>PT</strong> synsets are linked under <strong>the</strong> semantic relations <strong>of</strong> hypernymy/hyponymy<br />

and meronymy (part, member and substance) (Santos et al., 2010). Fur<strong>the</strong>rmore,<br />

MWN.<strong>PT</strong> includes <strong>the</strong> subontologies under <strong>the</strong> concepts <strong>of</strong> Person, Organization,<br />

Event, Location, and Art works. The authors <strong>of</strong> MWN.<strong>PT</strong> claim that <strong>the</strong>ir<br />

resource covers <strong>the</strong> top ontology with <strong>the</strong> Portuguese equivalents to all concepts in<br />

<strong>the</strong> top four layers <strong>of</strong> Princeton WordNet, to <strong>the</strong> 98 Base Concepts suggested by <strong>the</strong><br />

Global Wordnet Association, and to <strong>the</strong> 164 Core Base Concepts, indicated by <strong>the</strong><br />

EuroWordNet project 11 . However, MWN.<strong>PT</strong> only covers nouns, while <strong>the</strong> 164 Core<br />

Base Concepts contain not only 66 concrete and 63 abstract noun synsets, but also<br />

35 abstract verb synsets.<br />

WordNet.Br<br />

WordNet.Br (Dias da Silva et al., 2002) is a wordnet resource for <strong>the</strong> Brazilian<br />

variant <strong>of</strong> Portuguese. This project consisted <strong>of</strong> two main development phases.<br />

First, a team <strong>of</strong> three linguists analysed five Brazilian Portuguese dictionaries and<br />

two corpora in order to acquire synonymy and antonymy information. This resulted<br />

in <strong>the</strong> manual creation <strong>of</strong> synsets, and antonymy relations between <strong>the</strong>m, as well as<br />

<strong>the</strong> writing <strong>of</strong> synset glosses and <strong>the</strong> selection <strong>of</strong> sentences where <strong>the</strong> synset occurred.<br />

In a second phase (Dias-da Silva et al., 2006), <strong>the</strong> WordNet.Br synsets were<br />

manually aligned with <strong>the</strong> Princeton WordNet synsets, with <strong>the</strong> help <strong>of</strong> bilingual<br />

dictionaries. A strategy similar to that suggested by <strong>the</strong> EuroWordNet project was<br />

followed in this process. After <strong>the</strong> alignment, <strong>the</strong> WordNet.Br synsets that were<br />

aligned to Princeton WordNet could inherit <strong>the</strong> relations <strong>of</strong> this resource. This<br />

means that WordNet.Br covers <strong>the</strong> relations <strong>of</strong>: hypernymy, meronymy, cause and<br />

entailment.<br />

Portuguese <strong>the</strong>sauri<br />

TeP (Dias-Da-Silva and de Moraes, 2003; Maziero et al., 2008) was originally <strong>the</strong><br />

synset-base <strong>of</strong> WordNet.Br (Dias da Silva et al., 2002), created during its first development<br />

phase. It is maintained by Núcleo Interinstitucional de Lingüística Computacional<br />

(NILC) <strong>of</strong> <strong>the</strong> University <strong>of</strong> São Paulo, in São Carlos, Brazil. Its current<br />

version, TeP 2.0 12 , is publicly available and contains more than 44,000 lexical items,<br />

organised in 19,888 synsets. TeP also contains 4,276 antonymy relations between<br />

synsets.<br />

9The European Language Resources Association (ELRA) catalog is available from<br />

http://catalog.elra.info/ (August 2012)<br />

10Available online from http://mwnpt.di.fc.ul.pt/features.html (August 2012)<br />

11See more about <strong>the</strong>se lists <strong>of</strong> concepts in http://www.globalwordnet.org/gwa/ewn_to_bc/<br />

topont.htm (August 2012)<br />

12Available from http://www.nilc.icmc.usp.br/tep2/ (August 2012)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!