24.07.2013 Views

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.2. A large lexical network for Portuguese 63<br />

that, in DLP, frequently denote <strong>the</strong> relations we wanted to extract 13 . The grammars<br />

were created manually, after <strong>the</strong> analysis <strong>of</strong> <strong>the</strong> structure and vocabulary <strong>of</strong><br />

<strong>the</strong> DLP definitions, and <strong>the</strong> identification <strong>of</strong> regularities.<br />

In order to reproduce <strong>the</strong> grammar creation procedure for extracting relations<br />

from <strong>the</strong> o<strong>the</strong>r dictionaries, we also analysed <strong>the</strong> structure <strong>of</strong> <strong>the</strong>ir definitions. This<br />

analysis showed that most <strong>of</strong> <strong>the</strong> regularities used in <strong>the</strong> DLP definitions were preserved<br />

in DA and Wiktionary.<strong>PT</strong>, which meant that <strong>the</strong> grammars <strong>of</strong> PAPEL could<br />

be reused with minor changes. Table 4.1 shows <strong>the</strong> frequency and <strong>the</strong> semantic<br />

relation usually denoted by <strong>the</strong> most productive n-grams in <strong>the</strong> three dictionaries,<br />

which are those frequent and suitable for exploitation in <strong>the</strong> automatic extraction <strong>of</strong><br />

semantic relations. In <strong>the</strong> referred table, some patterns extract <strong>the</strong> direct relation<br />

(e.g. part-<strong>of</strong>) and o<strong>the</strong>rs <strong>the</strong> inverse relation (e.g. has-part) but, during <strong>the</strong> extraction<br />

procedure, all relations are normalised into <strong>the</strong> type agreed as <strong>the</strong> direct (e.g.<br />

keyboard has-part key is changed to key part-<strong>of</strong> keyboard).<br />

The few changes we made to <strong>the</strong> original grammars <strong>of</strong> PAPEL, include:<br />

• The pattern o mesmo que was used in <strong>the</strong> extraction <strong>of</strong> synonymy relations.<br />

• The keywords natural and habitante could change <strong>the</strong>ir order in <strong>the</strong> extraction<br />

<strong>of</strong> place-<strong>of</strong> relations.<br />

• Brazilian Portuguese specific orthography was considered in some patterns, as<br />

<strong>the</strong>y occurred in Wiktionary.<strong>PT</strong>. Words such as g^enero and ato were used,<br />

respectively, for <strong>the</strong> extraction <strong>of</strong> hypernymy and causation relations.<br />

In addition to <strong>the</strong> static patterns in table 4.1, two o<strong>the</strong>r productive rules were<br />

included in <strong>the</strong> grammars for extracting relations from <strong>the</strong> three dictionaries:<br />

• Synonymy can be extracted from definitions consisting <strong>of</strong> only one word or a<br />

enumeration <strong>of</strong> words. See <strong>the</strong> following example:<br />

talhar verbo gravar, cinzelar ou esculpir<br />

→ gravar synonym-<strong>of</strong> talhar<br />

→ cinzelar synonym-<strong>of</strong> talhar<br />

→ esculpir synonym-<strong>of</strong> talhar<br />

• As most noun definitions are structured on a genus and differentia (see section<br />

3.2.1), we identify <strong>the</strong> genus as a hypernym <strong>of</strong> <strong>the</strong> definiendum, which<br />

might eventually be modified by an adjective. The following are examples <strong>of</strong><br />

<strong>the</strong> application <strong>of</strong> this rule:<br />

island^es nome língua germ^anica falada na Isl^andia<br />

→ língua hypernym-<strong>of</strong> islandês<br />

pantera nome grande felino de o g^enero Pan<strong>the</strong>ra<br />

→ felino hypernym-<strong>of</strong> pantera<br />

The second rule does not apply, however, when <strong>the</strong> definition starts by a so<br />

called “empty head” (Chodorow et al., 1985; Guthrie et al., 1990), which is usually<br />

exploited in <strong>the</strong> extraction <strong>of</strong> o<strong>the</strong>r relations. The list <strong>of</strong> considered “empty<br />

heads” includes words such as acto, efeito (used for <strong>the</strong> extraction <strong>of</strong> <strong>the</strong> causation<br />

relation), qualidade (has-quality), estado (has-state), parte (part-<strong>of</strong>),<br />

13 The grammars <strong>of</strong> PAPEL are freely available from http://www.linguateca.pt/PAPEL/<br />

(September 2012)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!