Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
52 Chapter 3. Related Work<br />
Due to <strong>the</strong> scalability issues <strong>of</strong> KnowItAll, its authors proposed <strong>the</strong> paradigm <strong>of</strong><br />
Open Information Extraction (Banko et al., 2007) (OIE, see section 2.3.2 for more<br />
details). OIE systems make a single data-driven pass over a corpus and extract a<br />
large set <strong>of</strong> relational tuples, without requiring any human input.<br />
TextRunner (Banko et al., 2007) is a fully-implemented OIE system. In order<br />
to get a classifier that labels candidate extractions as trustworthy or not, a small<br />
corpus sample is given as input . Then, all tuples that are potential relations are<br />
extracted from <strong>the</strong> corpus. In <strong>the</strong> last step, relation names are normalised and<br />
tuples have a probability assigned. TextRunner is more scalable than KnowItAll,<br />
has a lower error rate and, considering only a set <strong>of</strong> 10 relation types, both systems<br />
extract an identical number <strong>of</strong> relations. However, since TextRunner does not take<br />
as input <strong>the</strong> name <strong>of</strong> <strong>the</strong> relations, its complete set <strong>of</strong> extractions contains more<br />
types <strong>of</strong> relations.<br />
More recently, ReVerb (Etzioni et al., 2011; Fader et al., 2011), a new and more<br />
efficient OIE system that does not need a classifier was presented. ReVerb is<br />
solely based on two constraints: (i) a syntactic constraint requires that <strong>the</strong> relation<br />
phrase matches a POS regular expression (verb | verb prep | verb word*<br />
prep); (ii) a lexical constraint requires that each relevant relation phrase occurs in<br />
<strong>the</strong> corpus with different arguments. The following illustrate ReVerb extractions:<br />
• {Calcium, prevents, osteoporosis}<br />
• {A galaxy, consists <strong>of</strong>, stars and stellar remnants}<br />
• {Most galaxies, appear to be, dwarf galaxies, which are small}<br />
The Never Ending Language Learner (NELL, Carlson et al. (2010a)) learns<br />
from reading contents on <strong>the</strong> Web and gets better at reading as it reads <strong>the</strong> same<br />
text multiple times. NELL’s starting point is: (i) a set <strong>of</strong> fundamental categories<br />
(e.g. person, sportsTeam, fruit, emotion) and relation types (e.g., playsOn-<br />
Team(athlete,sportsTeam), playsInstrument(musician,instrument)), that constitute<br />
an ontology; and (ii) a set <strong>of</strong> 10 to 15 seed examples for each category and relation.<br />
Then, NELL reads web pages continuously, 24 hours a day, for extracting new category<br />
instances and new relations between instances, which are used to populate <strong>the</strong><br />
ontology. The extracted contents are used as a self-supervised collection <strong>of</strong> training<br />
examples, used in <strong>the</strong> acquisition <strong>of</strong> new discriminating patterns. NELL employs<br />
coupled-training (Carlson et al., 2010b), which combines <strong>the</strong> simultaneous training<br />
<strong>of</strong> many extraction methods. The following are examples <strong>of</strong> NELL extractions:<br />
• musicArtistGenre(Nirvana, Grunge)<br />
• tvStationInCity(WLS-TV, Chicago)<br />
• sportUsesEquip(soccer, balls)<br />
The main difference between NELL and OIE systems is that NELL learns extractors<br />
for a fixed set <strong>of</strong> known relations, while an OIE system can extract meaningful<br />
information from any kind <strong>of</strong> corpora, on any domain, as relations are not given as<br />
a starting point (Etzioni et al., 2011). This has also an impact on <strong>the</strong> quantity <strong>of</strong><br />
extracted knowledge. Still, recently, Mohamed et al. (2011) reported how a system<br />
like NELL can learn new relation types between already extracted categories.