24.07.2013 Views

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

48 Chapter 3. Related Work<br />

6. {,} especially { ,}* {and | or} <br />

... most European countries, especially France, England, and Spain.<br />

⇒ {France hyponym <strong>of</strong> European country}, {England hyponym <strong>of</strong> European country},<br />

{Spain hyponym <strong>of</strong> European country}<br />

Inspired by <strong>the</strong> work <strong>of</strong> Hearst (1992), Freitas (2007) discusses <strong>the</strong> extraction <strong>of</strong><br />

hypernymy relations from Portuguese corpora. In her work, some Hearst patterns<br />

were adapted to Portuguese, which resulted in <strong>the</strong> following:<br />

• {tais} como {, ... , (e | ou) }<br />

A tentativa posterior de clonar outros mamíferos tais como camundongos, porcos,<br />

bezerros,....<br />

⇒ {camundongos hyponym <strong>of</strong> mamíferos}, {porcos hyponym <strong>of</strong> mamíferos},<br />

{bezerros hyponym <strong>of</strong> mamíferos}<br />

• {, }* {,} (e | ou) outros <br />

... a experiência subjetiva com o LSD-25 e outros alucinógenos.<br />

⇒ {LSD-25 hyponym <strong>of</strong> alucinógeno}<br />

• tipos de : { , ... ,} (e | ou) <br />

Existem dois tipos de cromossomos gigantes: cromossomos politênicos e cromossomos<br />

plumulados.<br />

⇒ {cromossomos politênicos hyponym <strong>of</strong> cromossomos}, {cromossomos plumulados<br />

hyponym <strong>of</strong> cromossomos}<br />

• chamad(o|os|a|as) {de} <br />

... a alta frequência da doença mental chamada esquiz<strong>of</strong>renia.<br />

⇒ {esquiz<strong>of</strong>renia hyponym <strong>of</strong> doença mental}<br />

Also for <strong>the</strong> extraction <strong>of</strong> hypernyms, Caraballo (1999) proposed a combination<br />

<strong>of</strong> pattern detection and a clustering method where noun candidates are obtained<br />

from a corpus using data on conjunctions and appositives. A co-occurrence matrix<br />

for all nouns is used. It contains a vector for each noun in <strong>the</strong> corpus, with <strong>the</strong><br />

number <strong>of</strong> times it co-occurs, in a conjunction or appositive, with each o<strong>the</strong>r noun.<br />

If v and w are <strong>the</strong> vectors <strong>of</strong> two nouns, similarity between <strong>the</strong>m is calculated as<br />

below, which can be see as a variant <strong>of</strong> LSA (cosine similarity):<br />

cos(v, w) =<br />

v. w<br />

|v|.| w|<br />

(3.3)<br />

In a post-processing step, Hearst-like patterns are used for finding hypernym<br />

candidates, which, if appropriate, are placed as common parent nodes for clusters.<br />

Cederberg and Widdows (2003) used a similar variant <strong>of</strong> LSA to improve <strong>the</strong><br />

precision and recall <strong>of</strong> hyponymy relations, extracted from a corpus using Hearstlike<br />

patterns. Having in mind that a hyponym and its hypernym are expected to be<br />

similar, LSA is used to compute <strong>the</strong> similarity <strong>of</strong> terms in <strong>the</strong> extracted relations.<br />

While <strong>the</strong> precision <strong>of</strong> a random sample <strong>of</strong> extracted relations was 40%, <strong>the</strong> precision<br />

<strong>of</strong> <strong>the</strong> 100 relations with higher similarity was 58%, which suggests <strong>the</strong> effectiveness<br />

<strong>of</strong> this method for reducing errors.<br />

Fur<strong>the</strong>rmore, as most <strong>of</strong> <strong>the</strong> potential hyponymy relations that could be extracted<br />

are not expressed by <strong>the</strong> six Hearst patterns, Cederberg and Widdows (2003)<br />

improved <strong>the</strong> recall <strong>of</strong> <strong>the</strong>ir method using coordination as a cue for similarity. They

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!