Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
60 Chapter 4. Acquisition <strong>of</strong> Semantic Relations<br />
1. Part <strong>of</strong> a grammar, with rules for extracting hypernymy (HIPERONIMO DE), part-<strong>of</strong>/haspart<br />
(PARTE DE/TEM PARTE), and purpose-<strong>of</strong> (FAZ SE COM) relations, and <strong>the</strong> definitions<br />
<strong>of</strong> an empty head (CABECA VAZIA):<br />
RAIZ ::= HIPERONIMO DE ...<br />
...<br />
RAIZ ::= CABECA VAZIA<br />
CABECA VAZIA ::= parte<br />
...<br />
RAIZ ::= ... usado para FAZ SE COM<br />
RAIZ ::= parte de TEM PARTE<br />
RAIZ ::= ... que contém DET PARTE DE<br />
2. Dictionary entries (definiendum, POS, definition) and relations extracted using <strong>the</strong><br />
previous rules:<br />
candeia nome utensílio doméstico rústico usado para iluminaç~ao, com<br />
pavio abastecido a óleo<br />
→ utensílio HIPERONIMO DE candeia<br />
→ com FAZ SE COM candeia<br />
→ iluminaç~ao FAZ SE COM candeia<br />
espiga nome parte das gramíneas que contém os gr~aos<br />
→ espiga PARTE DE gramíneas<br />
→ gr~aos PARTE DE espiga<br />
3. POS-tagging, cleaning and lemmatisation:<br />
candeia nome utensílio#n doméstico#adj rústico#adj usado#v-pcp<br />
para#prp iluminaç~ao#n ,#punc com#prp pavio#n<br />
abastecido#v-pcp a#prp óleo#n<br />
→ utensílio HIPERONIMO DE candeia<br />
→ iluminaç~ao FAZ SE COM candeia<br />
espiga nome parte#n de#prp as#art gramíneas#n que#pron-indp<br />
contém#v-fin os#art gr~aos#n<br />
→ espiga PARTE DE gramínea<br />
→ gr~ao PARTE DE espiga<br />
Figure 4.2: Extraction <strong>of</strong> semantic relations from dictionary definitions.<br />
4.2 A large lexical network for Portuguese<br />
The relation acquisition procedure was used to create CARTÃO (Gonçalo Oliveira<br />
et al., 2011), a large term-based lexical-semantic network for Portuguese, extracted<br />
from dictionaries. Regarding <strong>the</strong> incompleteness <strong>of</strong> dictionaries (Ide and Véronis,<br />
1995)), we exploited not one, but three electronic dictionaries <strong>of</strong> Portuguese, namely:<br />
• Dicionário PRO da Língua Portuguesa (DLP, 2005), indirectly with <strong>the</strong> results<br />
<strong>of</strong> <strong>the</strong> project PAPEL;<br />
• Dicionário Aberto (DA) (Simões and Farinha, 2011; Simões et al., 2012);<br />
• Wiktionary.<strong>PT</strong> 6 .<br />
6 Available from http://pt.wiktionary.org/ (September 2012)