Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
4.2. A large lexical network for Portuguese 63<br />
that, in DLP, frequently denote <strong>the</strong> relations we wanted to extract 13 . The grammars<br />
were created manually, after <strong>the</strong> analysis <strong>of</strong> <strong>the</strong> structure and vocabulary <strong>of</strong><br />
<strong>the</strong> DLP definitions, and <strong>the</strong> identification <strong>of</strong> regularities.<br />
In order to reproduce <strong>the</strong> grammar creation procedure for extracting relations<br />
from <strong>the</strong> o<strong>the</strong>r dictionaries, we also analysed <strong>the</strong> structure <strong>of</strong> <strong>the</strong>ir definitions. This<br />
analysis showed that most <strong>of</strong> <strong>the</strong> regularities used in <strong>the</strong> DLP definitions were preserved<br />
in DA and Wiktionary.<strong>PT</strong>, which meant that <strong>the</strong> grammars <strong>of</strong> PAPEL could<br />
be reused with minor changes. Table 4.1 shows <strong>the</strong> frequency and <strong>the</strong> semantic<br />
relation usually denoted by <strong>the</strong> most productive n-grams in <strong>the</strong> three dictionaries,<br />
which are those frequent and suitable for exploitation in <strong>the</strong> automatic extraction <strong>of</strong><br />
semantic relations. In <strong>the</strong> referred table, some patterns extract <strong>the</strong> direct relation<br />
(e.g. part-<strong>of</strong>) and o<strong>the</strong>rs <strong>the</strong> inverse relation (e.g. has-part) but, during <strong>the</strong> extraction<br />
procedure, all relations are normalised into <strong>the</strong> type agreed as <strong>the</strong> direct (e.g.<br />
keyboard has-part key is changed to key part-<strong>of</strong> keyboard).<br />
The few changes we made to <strong>the</strong> original grammars <strong>of</strong> PAPEL, include:<br />
• The pattern o mesmo que was used in <strong>the</strong> extraction <strong>of</strong> synonymy relations.<br />
• The keywords natural and habitante could change <strong>the</strong>ir order in <strong>the</strong> extraction<br />
<strong>of</strong> place-<strong>of</strong> relations.<br />
• Brazilian Portuguese specific orthography was considered in some patterns, as<br />
<strong>the</strong>y occurred in Wiktionary.<strong>PT</strong>. Words such as g^enero and ato were used,<br />
respectively, for <strong>the</strong> extraction <strong>of</strong> hypernymy and causation relations.<br />
In addition to <strong>the</strong> static patterns in table 4.1, two o<strong>the</strong>r productive rules were<br />
included in <strong>the</strong> grammars for extracting relations from <strong>the</strong> three dictionaries:<br />
• Synonymy can be extracted from definitions consisting <strong>of</strong> only one word or a<br />
enumeration <strong>of</strong> words. See <strong>the</strong> following example:<br />
talhar verbo gravar, cinzelar ou esculpir<br />
→ gravar synonym-<strong>of</strong> talhar<br />
→ cinzelar synonym-<strong>of</strong> talhar<br />
→ esculpir synonym-<strong>of</strong> talhar<br />
• As most noun definitions are structured on a genus and differentia (see section<br />
3.2.1), we identify <strong>the</strong> genus as a hypernym <strong>of</strong> <strong>the</strong> definiendum, which<br />
might eventually be modified by an adjective. The following are examples <strong>of</strong><br />
<strong>the</strong> application <strong>of</strong> this rule:<br />
island^es nome língua germ^anica falada na Isl^andia<br />
→ língua hypernym-<strong>of</strong> islandês<br />
pantera nome grande felino de o g^enero Pan<strong>the</strong>ra<br />
→ felino hypernym-<strong>of</strong> pantera<br />
The second rule does not apply, however, when <strong>the</strong> definition starts by a so<br />
called “empty head” (Chodorow et al., 1985; Guthrie et al., 1990), which is usually<br />
exploited in <strong>the</strong> extraction <strong>of</strong> o<strong>the</strong>r relations. The list <strong>of</strong> considered “empty<br />
heads” includes words such as acto, efeito (used for <strong>the</strong> extraction <strong>of</strong> <strong>the</strong> causation<br />
relation), qualidade (has-quality), estado (has-state), parte (part-<strong>of</strong>),<br />
13 The grammars <strong>of</strong> PAPEL are freely available from http://www.linguateca.pt/PAPEL/<br />
(September 2012)