24.07.2013 Views

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

52 Chapter 3. Related Work<br />

Due to <strong>the</strong> scalability issues <strong>of</strong> KnowItAll, its authors proposed <strong>the</strong> paradigm <strong>of</strong><br />

Open Information Extraction (Banko et al., 2007) (OIE, see section 2.3.2 for more<br />

details). OIE systems make a single data-driven pass over a corpus and extract a<br />

large set <strong>of</strong> relational tuples, without requiring any human input.<br />

TextRunner (Banko et al., 2007) is a fully-implemented OIE system. In order<br />

to get a classifier that labels candidate extractions as trustworthy or not, a small<br />

corpus sample is given as input . Then, all tuples that are potential relations are<br />

extracted from <strong>the</strong> corpus. In <strong>the</strong> last step, relation names are normalised and<br />

tuples have a probability assigned. TextRunner is more scalable than KnowItAll,<br />

has a lower error rate and, considering only a set <strong>of</strong> 10 relation types, both systems<br />

extract an identical number <strong>of</strong> relations. However, since TextRunner does not take<br />

as input <strong>the</strong> name <strong>of</strong> <strong>the</strong> relations, its complete set <strong>of</strong> extractions contains more<br />

types <strong>of</strong> relations.<br />

More recently, ReVerb (Etzioni et al., 2011; Fader et al., 2011), a new and more<br />

efficient OIE system that does not need a classifier was presented. ReVerb is<br />

solely based on two constraints: (i) a syntactic constraint requires that <strong>the</strong> relation<br />

phrase matches a POS regular expression (verb | verb prep | verb word*<br />

prep); (ii) a lexical constraint requires that each relevant relation phrase occurs in<br />

<strong>the</strong> corpus with different arguments. The following illustrate ReVerb extractions:<br />

• {Calcium, prevents, osteoporosis}<br />

• {A galaxy, consists <strong>of</strong>, stars and stellar remnants}<br />

• {Most galaxies, appear to be, dwarf galaxies, which are small}<br />

The Never Ending Language Learner (NELL, Carlson et al. (2010a)) learns<br />

from reading contents on <strong>the</strong> Web and gets better at reading as it reads <strong>the</strong> same<br />

text multiple times. NELL’s starting point is: (i) a set <strong>of</strong> fundamental categories<br />

(e.g. person, sportsTeam, fruit, emotion) and relation types (e.g., playsOn-<br />

Team(athlete,sportsTeam), playsInstrument(musician,instrument)), that constitute<br />

an ontology; and (ii) a set <strong>of</strong> 10 to 15 seed examples for each category and relation.<br />

Then, NELL reads web pages continuously, 24 hours a day, for extracting new category<br />

instances and new relations between instances, which are used to populate <strong>the</strong><br />

ontology. The extracted contents are used as a self-supervised collection <strong>of</strong> training<br />

examples, used in <strong>the</strong> acquisition <strong>of</strong> new discriminating patterns. NELL employs<br />

coupled-training (Carlson et al., 2010b), which combines <strong>the</strong> simultaneous training<br />

<strong>of</strong> many extraction methods. The following are examples <strong>of</strong> NELL extractions:<br />

• musicArtistGenre(Nirvana, Grunge)<br />

• tvStationInCity(WLS-TV, Chicago)<br />

• sportUsesEquip(soccer, balls)<br />

The main difference between NELL and OIE systems is that NELL learns extractors<br />

for a fixed set <strong>of</strong> known relations, while an OIE system can extract meaningful<br />

information from any kind <strong>of</strong> corpora, on any domain, as relations are not given as<br />

a starting point (Etzioni et al., 2011). This has also an impact on <strong>the</strong> quantity <strong>of</strong><br />

extracted knowledge. Still, recently, Mohamed et al. (2011) reported how a system<br />

like NELL can learn new relation types between already extracted categories.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!