Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
78 Chapter 4. Acquisition <strong>of</strong> Semantic Relations<br />
where transitivity was applied to <strong>the</strong> synonymy relations <strong>of</strong> PAPEL, giving rise to<br />
some inconsistencies as <strong>the</strong> following:<br />
• queda synonym-<strong>of</strong> ruína ∧ queda synonym-<strong>of</strong> habilidade<br />
→ ruína synonym-<strong>of</strong> habilidade<br />
The problem occurs because one sense <strong>of</strong> queda is <strong>the</strong> result <strong>of</strong> falling, while<br />
ano<strong>the</strong>r means to have some skill. Therefore, combining those two, we obtain that<br />
ruína (ruin) is <strong>the</strong> same as habilidade (ability, skill), which are almost opposites.<br />
Never<strong>the</strong>less, since <strong>the</strong> beginning <strong>of</strong> <strong>the</strong> project PAPEL, our option was to build<br />
a lexical resource where lexical items were not divided into word senses. That early<br />
option relied on <strong>the</strong> following:<br />
• From a linguistic point <strong>of</strong> view, word senses are not discrete and cannot be<br />
separated with clear boundaries (Kilgarriff, 1996; Hirst, 2004). Sense division<br />
in dictionaries and lexical ontologies is most <strong>of</strong> <strong>the</strong> times artificial.<br />
• Following <strong>the</strong> previous point, <strong>the</strong> sense granularity in dictionaries and lexical<br />
ontologies is <strong>of</strong>ten different from lexicographer to lexicographer. As <strong>the</strong>re is<br />
not a well-defined criteria for <strong>the</strong> division <strong>of</strong> meanings, word senses in different<br />
resources do not always match (Dolan, 1994; Peters et al., 1998).<br />
• Word sense disambiguation (WSD, see Navigli (2009b) for a survey) is <strong>the</strong> task<br />
<strong>of</strong>, given <strong>the</strong> context where a word occurs, selecting <strong>the</strong> most adequate <strong>of</strong> its<br />
senses from a sense inventory. However, <strong>the</strong> previous points confirm that WSD<br />
is an ill-defined task and is very dependent on <strong>the</strong> purpose (Wilks, 2000).<br />
• Dictionaries do not provide <strong>the</strong> sense corresponding to a word occurring in a<br />
definition. After <strong>the</strong> first version <strong>of</strong> PAPEL was released, Navigli (2009a) actually<br />
presented a method for disambiguating words in dictionary definitions.<br />
Still, given <strong>the</strong> aforementioned problems on WSD, <strong>the</strong> term-based structure<br />
<strong>of</strong> PAPEL was kept.<br />
• Finally, in natural language, <strong>the</strong> study <strong>of</strong> vagueness is as, or even more, important<br />
that studying ambiguity (see e.g. Santos (1997)).<br />
When we started to extract relations from o<strong>the</strong>r dictionaries (and <strong>the</strong>sauri), we<br />
confirmed that <strong>the</strong> senses <strong>of</strong> words occurring in more than one resource did not match<br />
for different resources. Moreover, not all definitions in Wiktionary.<strong>PT</strong> have a sense<br />
number and synonymy lists do not always indicate <strong>the</strong> corresponding synonymous<br />
sense. Since we are extracting information from more than one lexical resource, an<br />
alternative would be to align <strong>the</strong> word senses in different resources (represented as<br />
definitions in dictionaries or synsets in <strong>the</strong>sauri), as o<strong>the</strong>rs did (e.g. Vossen et al.<br />
(2008); Henrich et al. (2012)). Still, given <strong>the</strong> aforementioned utility <strong>of</strong> a lexical<br />
resource as PAPEL, we decided to keep CARTÃO as a term-based resource.<br />
In <strong>the</strong> following chapters, we explain how <strong>the</strong> structure <strong>of</strong> CARTÃO can evolve<br />
to a resource that handles word senses. After <strong>the</strong> additional steps <strong>of</strong> <strong>the</strong> ECO<br />
approach, <strong>the</strong> result is <strong>Onto</strong>.<strong>PT</strong>, a resource structured in synsets. We recall that<br />
this approach is flexible in a way that it enables <strong>the</strong> construction (and fur<strong>the</strong>r<br />
augmentation) <strong>of</strong> a wordnet, based on <strong>the</strong> integration <strong>of</strong> knowledge from multiple<br />
heterogeneous sources and, from this point, it does not require an additional analysis<br />
<strong>of</strong> <strong>the</strong> extraction context. The only requirement is that <strong>the</strong> initial information is<br />
represented as tb-triples, which is kind <strong>of</strong> a standard representation.