24.07.2013 Views

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.2. <strong>Lexical</strong>-Semantic Information Extraction 49<br />

give <strong>the</strong> following sentences, taken from <strong>the</strong> British National Corpus 23 (BNC), to<br />

illustrate <strong>the</strong>ir assumptions:<br />

1. This is not <strong>the</strong> case with sugar, honey, grape must, cloves and o<strong>the</strong>r spices<br />

which increase its merit.<br />

⇒ {clove hyponym <strong>of</strong> spice}<br />

2. Ships laden with nutmeg or cinnamon, cloves or coriander once battled <strong>the</strong><br />

Seven Seas to bring home <strong>the</strong>ir precious cargo.<br />

⇒ {nutmeg hyponym <strong>of</strong> spice}<br />

⇒ {cinnamon hyponym <strong>of</strong> spice}<br />

⇒ {coriander hyponym <strong>of</strong> spice}<br />

Using <strong>the</strong> correct relations extracted without <strong>the</strong> LSA filter, for each hyponym,<br />

<strong>the</strong> top ten most similar words were collected and tested for having <strong>the</strong> same hypernym.<br />

This resulted in a slight improvement <strong>of</strong> precision, while <strong>the</strong> number <strong>of</strong><br />

relations obtained was ten times higher.<br />

Berland and Charniak (1999) present work on <strong>the</strong> extraction <strong>of</strong> part-<strong>of</strong> relations<br />

from a corpus, using handcrafted patterns. In a similar fashion to Hearst (1992),<br />

seed instances are used to infer linguistic patterns, <strong>the</strong>n used to acquire new relation<br />

instances. In <strong>the</strong> end, <strong>the</strong> extracted instances are ranked according to <strong>the</strong>ir loglikelihood<br />

(Dunning, 1993).<br />

Girju and Moldovan (2002) followed Hearst’s method to discover lexical-syntactic<br />

patterns expressing causation. Given that only some categories <strong>of</strong> nouns (e.g.<br />

states <strong>of</strong> affairs) can be associated with causation, extracted relations were later<br />

validated regarding semantic constraints on <strong>the</strong> relation arguments.<br />

Cimiano and Wenderoth (2007) present an approach for <strong>the</strong> automatic acquisition<br />

<strong>of</strong> qualia structures (Pustejovsky, 1991), which aim to describe <strong>the</strong> meaning<br />

<strong>of</strong> lexical elements (earlier presented in section 2.2.5 <strong>of</strong> this <strong>the</strong>sis). Willing to<br />

decrease <strong>the</strong> problem <strong>of</strong> data sparseness, <strong>the</strong>y propose looking for discriminating<br />

patterns in <strong>the</strong> Web. For each qualia term, a set <strong>of</strong> search engine queries for each<br />

qualia role is generated, based on known lexical-syntactic patterns. The first 50<br />

snippets returned are downloaded and POS-tagged. Then, patterns, defined over<br />

POS-tags, conveying <strong>the</strong> qualia role <strong>of</strong> interest, are matched to obtain candidate<br />

qualia elements. In <strong>the</strong> end, <strong>the</strong> candidates are weighted and ranked according to<br />

well-known similarity measures (e.g. Jaccard coefficient, PMI).<br />

The main problem <strong>of</strong> <strong>the</strong> aforementioned approaches is that <strong>the</strong>y rely on a finite<br />

set <strong>of</strong> handcrafted rules, though some discovered with <strong>the</strong> help <strong>of</strong> automatic procedures,<br />

and are <strong>the</strong>refore vulnerable to data sparseness. Even though Hearst (1992)<br />

says that <strong>the</strong> six proposed patterns occur frequently, <strong>the</strong>y are unlikely to capture<br />

all <strong>the</strong> occurrences <strong>of</strong> <strong>the</strong> target relation(s).<br />

About <strong>the</strong> manual identification <strong>of</strong> semantic patterns, Snow et al. (2005) add<br />

that it is not very interesting and can be biased by <strong>the</strong> designer. They propose<br />

a supervised approach, trained with WordNet, to discover hyponymy patterns,<br />

and an automatic classifier that decides if a hypernymy relation holds between two<br />

nouns. Their procedure works as follows:<br />

23 See http://www.natcorp.ox.ac.uk/ (August 2012)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!