24.07.2013 Views

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

78 Chapter 4. Acquisition <strong>of</strong> Semantic Relations<br />

where transitivity was applied to <strong>the</strong> synonymy relations <strong>of</strong> PAPEL, giving rise to<br />

some inconsistencies as <strong>the</strong> following:<br />

• queda synonym-<strong>of</strong> ruína ∧ queda synonym-<strong>of</strong> habilidade<br />

→ ruína synonym-<strong>of</strong> habilidade<br />

The problem occurs because one sense <strong>of</strong> queda is <strong>the</strong> result <strong>of</strong> falling, while<br />

ano<strong>the</strong>r means to have some skill. Therefore, combining those two, we obtain that<br />

ruína (ruin) is <strong>the</strong> same as habilidade (ability, skill), which are almost opposites.<br />

Never<strong>the</strong>less, since <strong>the</strong> beginning <strong>of</strong> <strong>the</strong> project PAPEL, our option was to build<br />

a lexical resource where lexical items were not divided into word senses. That early<br />

option relied on <strong>the</strong> following:<br />

• From a linguistic point <strong>of</strong> view, word senses are not discrete and cannot be<br />

separated with clear boundaries (Kilgarriff, 1996; Hirst, 2004). Sense division<br />

in dictionaries and lexical ontologies is most <strong>of</strong> <strong>the</strong> times artificial.<br />

• Following <strong>the</strong> previous point, <strong>the</strong> sense granularity in dictionaries and lexical<br />

ontologies is <strong>of</strong>ten different from lexicographer to lexicographer. As <strong>the</strong>re is<br />

not a well-defined criteria for <strong>the</strong> division <strong>of</strong> meanings, word senses in different<br />

resources do not always match (Dolan, 1994; Peters et al., 1998).<br />

• Word sense disambiguation (WSD, see Navigli (2009b) for a survey) is <strong>the</strong> task<br />

<strong>of</strong>, given <strong>the</strong> context where a word occurs, selecting <strong>the</strong> most adequate <strong>of</strong> its<br />

senses from a sense inventory. However, <strong>the</strong> previous points confirm that WSD<br />

is an ill-defined task and is very dependent on <strong>the</strong> purpose (Wilks, 2000).<br />

• Dictionaries do not provide <strong>the</strong> sense corresponding to a word occurring in a<br />

definition. After <strong>the</strong> first version <strong>of</strong> PAPEL was released, Navigli (2009a) actually<br />

presented a method for disambiguating words in dictionary definitions.<br />

Still, given <strong>the</strong> aforementioned problems on WSD, <strong>the</strong> term-based structure<br />

<strong>of</strong> PAPEL was kept.<br />

• Finally, in natural language, <strong>the</strong> study <strong>of</strong> vagueness is as, or even more, important<br />

that studying ambiguity (see e.g. Santos (1997)).<br />

When we started to extract relations from o<strong>the</strong>r dictionaries (and <strong>the</strong>sauri), we<br />

confirmed that <strong>the</strong> senses <strong>of</strong> words occurring in more than one resource did not match<br />

for different resources. Moreover, not all definitions in Wiktionary.<strong>PT</strong> have a sense<br />

number and synonymy lists do not always indicate <strong>the</strong> corresponding synonymous<br />

sense. Since we are extracting information from more than one lexical resource, an<br />

alternative would be to align <strong>the</strong> word senses in different resources (represented as<br />

definitions in dictionaries or synsets in <strong>the</strong>sauri), as o<strong>the</strong>rs did (e.g. Vossen et al.<br />

(2008); Henrich et al. (2012)). Still, given <strong>the</strong> aforementioned utility <strong>of</strong> a lexical<br />

resource as PAPEL, we decided to keep CARTÃO as a term-based resource.<br />

In <strong>the</strong> following chapters, we explain how <strong>the</strong> structure <strong>of</strong> CARTÃO can evolve<br />

to a resource that handles word senses. After <strong>the</strong> additional steps <strong>of</strong> <strong>the</strong> ECO<br />

approach, <strong>the</strong> result is <strong>Onto</strong>.<strong>PT</strong>, a resource structured in synsets. We recall that<br />

this approach is flexible in a way that it enables <strong>the</strong> construction (and fur<strong>the</strong>r<br />

augmentation) <strong>of</strong> a wordnet, based on <strong>the</strong> integration <strong>of</strong> knowledge from multiple<br />

heterogeneous sources and, from this point, it does not require an additional analysis<br />

<strong>of</strong> <strong>the</strong> extraction context. The only requirement is that <strong>the</strong> initial information is<br />

represented as tb-triples, which is kind <strong>of</strong> a standard representation.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!