24.07.2013 Views

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

50 Chapter 3. Related Work<br />

1. Extract all hypernym-hyponym pairs from WordNet.<br />

2. For each pair, find sentences in which both words occur.<br />

3. Parse <strong>the</strong> sentences, and automatically extract patterns from <strong>the</strong> obtained trees,<br />

which are good cues for hypernymy.<br />

4. Train a hypernymy classifier based on <strong>the</strong> previous features.<br />

Besides rediscovering <strong>the</strong> six Hearst patterns, which gives a quantitative justification<br />

to Hearst’s intuition, Snow et al. (2005) were able to discover <strong>the</strong> following<br />

additional patterns:<br />

• like <br />

• called <br />

• is a <br />

• , a <br />

Girju et al. (2006) used a heavily supervised approach as well, based on WordNet,<br />

this time for discovering part-<strong>of</strong> relations. The same authors presented a similar<br />

approach for <strong>the</strong> extraction <strong>of</strong> manner-<strong>of</strong> relations (Girju et al., 2003). However,<br />

as WordNet does not contain this kind <strong>of</strong> relation, <strong>the</strong> classifier was trained with a<br />

corpus where <strong>the</strong>se relations were manually annotated.<br />

Despite quite successful works on supervised LSIE, when <strong>the</strong>re is not an available<br />

set <strong>of</strong> reliable relations <strong>of</strong> a certain type with a considerable size, a fully supervised<br />

approach is not suitable, unless one is willing to create such a set. An alternative is<br />

to use a bootstrapping approach, as in <strong>the</strong> Espresso algorithm (Pantel and Pennacchiotti,<br />

2006), that acquires semantic relations with minimal supervision. Pantel<br />

and Pennacchiotti (2006)’s main contribution is <strong>the</strong> exploitation <strong>of</strong> broad coverage<br />

noisy patterns (generic patterns), which increase recall, but have typically low precision<br />

(e.g. X <strong>of</strong> Y for part-<strong>of</strong>). Espresso starts with a small set <strong>of</strong> seed instances,<br />

I, and iterates through three main phases: (i) pattern induction, (ii) pattern ranking/selection,<br />

and (iii) instance extraction, briefly described below:<br />

1. Infer a set <strong>of</strong> surface patterns, P , which are strings that, in <strong>the</strong> corpus, connect<br />

<strong>the</strong> arguments <strong>of</strong> <strong>the</strong> seed instances.<br />

2. Rank each inferred pattern, p ∈ P , according to its reliability, rπ(p), given by<br />

its average strength <strong>of</strong> association across each instance i ∈ I:<br />

rπ(p) =<br />

<br />

i∈I<br />

pmi(i,p)<br />

maxpmi<br />

|I|<br />

<br />

∗ rl(i)<br />

Here, maxpmi is <strong>the</strong> maximum PMI between all patterns and all instances, given<br />

by <strong>the</strong> ratio between <strong>the</strong> frequency <strong>of</strong> p connecting terms x and y, |x, p, y|, and<br />

all <strong>the</strong> co-occurrences <strong>of</strong> x and y times <strong>the</strong> number <strong>of</strong> occurrences <strong>of</strong> p, pmi(i, p):

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!