Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
3.2. <strong>Lexical</strong>-Semantic Information Extraction 49<br />
give <strong>the</strong> following sentences, taken from <strong>the</strong> British National Corpus 23 (BNC), to<br />
illustrate <strong>the</strong>ir assumptions:<br />
1. This is not <strong>the</strong> case with sugar, honey, grape must, cloves and o<strong>the</strong>r spices<br />
which increase its merit.<br />
⇒ {clove hyponym <strong>of</strong> spice}<br />
2. Ships laden with nutmeg or cinnamon, cloves or coriander once battled <strong>the</strong><br />
Seven Seas to bring home <strong>the</strong>ir precious cargo.<br />
⇒ {nutmeg hyponym <strong>of</strong> spice}<br />
⇒ {cinnamon hyponym <strong>of</strong> spice}<br />
⇒ {coriander hyponym <strong>of</strong> spice}<br />
Using <strong>the</strong> correct relations extracted without <strong>the</strong> LSA filter, for each hyponym,<br />
<strong>the</strong> top ten most similar words were collected and tested for having <strong>the</strong> same hypernym.<br />
This resulted in a slight improvement <strong>of</strong> precision, while <strong>the</strong> number <strong>of</strong><br />
relations obtained was ten times higher.<br />
Berland and Charniak (1999) present work on <strong>the</strong> extraction <strong>of</strong> part-<strong>of</strong> relations<br />
from a corpus, using handcrafted patterns. In a similar fashion to Hearst (1992),<br />
seed instances are used to infer linguistic patterns, <strong>the</strong>n used to acquire new relation<br />
instances. In <strong>the</strong> end, <strong>the</strong> extracted instances are ranked according to <strong>the</strong>ir loglikelihood<br />
(Dunning, 1993).<br />
Girju and Moldovan (2002) followed Hearst’s method to discover lexical-syntactic<br />
patterns expressing causation. Given that only some categories <strong>of</strong> nouns (e.g.<br />
states <strong>of</strong> affairs) can be associated with causation, extracted relations were later<br />
validated regarding semantic constraints on <strong>the</strong> relation arguments.<br />
Cimiano and Wenderoth (2007) present an approach for <strong>the</strong> automatic acquisition<br />
<strong>of</strong> qualia structures (Pustejovsky, 1991), which aim to describe <strong>the</strong> meaning<br />
<strong>of</strong> lexical elements (earlier presented in section 2.2.5 <strong>of</strong> this <strong>the</strong>sis). Willing to<br />
decrease <strong>the</strong> problem <strong>of</strong> data sparseness, <strong>the</strong>y propose looking for discriminating<br />
patterns in <strong>the</strong> Web. For each qualia term, a set <strong>of</strong> search engine queries for each<br />
qualia role is generated, based on known lexical-syntactic patterns. The first 50<br />
snippets returned are downloaded and POS-tagged. Then, patterns, defined over<br />
POS-tags, conveying <strong>the</strong> qualia role <strong>of</strong> interest, are matched to obtain candidate<br />
qualia elements. In <strong>the</strong> end, <strong>the</strong> candidates are weighted and ranked according to<br />
well-known similarity measures (e.g. Jaccard coefficient, PMI).<br />
The main problem <strong>of</strong> <strong>the</strong> aforementioned approaches is that <strong>the</strong>y rely on a finite<br />
set <strong>of</strong> handcrafted rules, though some discovered with <strong>the</strong> help <strong>of</strong> automatic procedures,<br />
and are <strong>the</strong>refore vulnerable to data sparseness. Even though Hearst (1992)<br />
says that <strong>the</strong> six proposed patterns occur frequently, <strong>the</strong>y are unlikely to capture<br />
all <strong>the</strong> occurrences <strong>of</strong> <strong>the</strong> target relation(s).<br />
About <strong>the</strong> manual identification <strong>of</strong> semantic patterns, Snow et al. (2005) add<br />
that it is not very interesting and can be biased by <strong>the</strong> designer. They propose<br />
a supervised approach, trained with WordNet, to discover hyponymy patterns,<br />
and an automatic classifier that decides if a hypernymy relation holds between two<br />
nouns. Their procedure works as follows:<br />
23 See http://www.natcorp.ox.ac.uk/ (August 2012)