24.07.2013 Views

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 7<br />

Moving from term-based to<br />

synset-based relations<br />

Typical information extraction (IE) systems are capable <strong>of</strong> acquiring concept instances<br />

and information about <strong>the</strong>se concepts from large collections <strong>of</strong> text. Whe<strong>the</strong>r<br />

<strong>the</strong>se systems aim for <strong>the</strong> automatic acquisition <strong>of</strong> lexical-semantic relations (e.g.<br />

Chodorow et al. (1985); Hearst (1992); Pantel and Pennacchiotti (2006)), <strong>of</strong> knowledge<br />

on specific domains (e.g. Pustejovsky et al. (2002); Wiegand et al. (2012)),<br />

or <strong>the</strong> extraction <strong>of</strong> open-domain facts (e.g. Agichtein and Gravano (2000); Banko<br />

et al. (2007); Etzioni et al. (2011)) <strong>the</strong>y typically represent concepts as terms, which<br />

are lexical items identified by <strong>the</strong>ir lemma. This is also how CARTÃO is structured.<br />

There, semantic relations are denoted by relational triples t = {a R b},<br />

where <strong>the</strong> arguments (a and b) are terms whose meaning is connected by a relation<br />

described by R. As we have done throughout this <strong>the</strong>sis, we refer to <strong>the</strong> previous<br />

representation as term-based triples (tb-triples).<br />

The problem is that a simple term is usually not enough to unambiguously refer<br />

to a concept, because <strong>the</strong> same word might have different meanings and different<br />

words might have <strong>the</strong> same meaning. On <strong>the</strong> one hand, this problem is not severe in<br />

<strong>the</strong> extraction <strong>of</strong> domain knowledge, where, based on <strong>the</strong> “one sense per discourse”<br />

assumption (Gale et al., 1992), ambiguity is low. On <strong>the</strong> o<strong>the</strong>r hand, when dealing<br />

with broad-coverage knowledge, if ambiguities are not handled, it becomes impractical<br />

to formalise <strong>the</strong> extracted information and to accomplish tasks such as inference<br />

for discovering new knowledge.<br />

Therefore, to make IE systems more useful, a new step, which can be seen<br />

as a kind <strong>of</strong> WSD, is needed. Originally baptised as ontologising (Pantel, 2005),<br />

this step aims at moving from knowledge structured in terms, identified by <strong>the</strong>ir<br />

orthographical form, towards an ontological structure, organised in concepts, which<br />

is done by associating <strong>the</strong> terms to a representation <strong>of</strong> <strong>the</strong>ir meaning.<br />

After <strong>the</strong> steps presented in <strong>the</strong> previous chapters, we are left with a lexical<br />

network, CARTÃO, with tb-triples extracted from text (chapter 4), and with a<br />

<strong>the</strong>saurus, with synsets (chapter 5 and 6). While <strong>the</strong> synsets can be seen as concepts<br />

and <strong>the</strong>ir possible lexicalisations, <strong>the</strong> identification <strong>of</strong> <strong>the</strong> correct sense(s) <strong>of</strong><br />

<strong>the</strong> arguments <strong>of</strong> a tb-triple for which <strong>the</strong> relation is valid is not straightforward.<br />

However, whereas most WSD techniques rely on <strong>the</strong> context where <strong>the</strong> words to<br />

be disambiguated occur to find <strong>the</strong>ir most adequate sense, <strong>the</strong> tb-triples do not<br />

provide <strong>the</strong>ir extraction context. While we could recover <strong>the</strong> context for some <strong>of</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!