24.07.2013 Views

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2.3. Information Extraction from Text 21<br />

2.3 Information Extraction from Text<br />

Information extraction (IE, see Moens (2006) for an extensive overview) consists<br />

<strong>of</strong> <strong>the</strong> identification and classification <strong>of</strong> information in unstructured data sources,<br />

which this way becomes structured and ready, for instance, for populating a relational<br />

database and for being used directly by computational applications. In <strong>the</strong><br />

specific case <strong>of</strong> IE from text, <strong>the</strong> target is text, written in natural language.<br />

Although <strong>the</strong>y are both solutions to <strong>the</strong> information overload problem, IE should<br />

not be confused with information retrieval (IR) (Baeza-Yates and Ribeiro-Neto,<br />

1999), which is <strong>the</strong> task <strong>of</strong> locating required information within collections <strong>of</strong> data.<br />

In IR, <strong>the</strong> information to be searched is specified by a query, which can be, for<br />

instance, a group <strong>of</strong> keywords or a natural language question. The retrieved information<br />

is <strong>of</strong>ten a list <strong>of</strong> relevant documents, according to <strong>the</strong> query, which should<br />

thus contain <strong>the</strong> required information.<br />

2.3.1 Tasks in Information Extraction from Text<br />

According to Jurafsky and Martin (2009), a complete system for IE from text has<br />

typically four steps, where it performs <strong>the</strong> tasks <strong>of</strong> named entity recognition (NER),<br />

relation detection and classification, temporal event processing, and template filling:<br />

1. NER (Chinchor and Robinson, 1997; Mota and Santos, 2008) is <strong>the</strong> task <strong>of</strong><br />

identifying proper names mentioned in text. It can include <strong>the</strong> classification<br />

<strong>of</strong> <strong>the</strong> entities, which consists <strong>of</strong> attributing a category and, sometimes, a<br />

sub-category, to <strong>the</strong> entities, from a range including, but <strong>of</strong>ten not limited<br />

to, people, organizations and places. Moreover, as <strong>the</strong> entities are not always<br />

mentioned by <strong>the</strong> same name, and are sometimes referred by a pronoun, <strong>the</strong><br />

task <strong>of</strong> NER might as well need to deal with coreference and anaphora resolution<br />

(Mitkov et al., 2000; Recasens et al., 2010). In our work, we are more<br />

interested in <strong>the</strong> identification <strong>of</strong> lexical entities, and not named entities.<br />

2. Relation detection and classification (Hendrickx et al., 2010) is closely<br />

related to <strong>the</strong> scope <strong>of</strong> this <strong>the</strong>sis, and is <strong>the</strong> task <strong>of</strong> identifying semantic<br />

relations among <strong>the</strong> discovered entities, including, but not limited to, <strong>the</strong> ones<br />

presented in section 2.1.2. Semantic relations between named entities include,<br />

but are not limited to, family, employment or geospatial relations (see more<br />

in Freitas et al. (2009)).<br />

3. As some <strong>of</strong> <strong>the</strong> relations might be true or false for different periods <strong>of</strong> time, it<br />

is sometimes important to determine when <strong>the</strong> events in <strong>the</strong> text happened.<br />

Temporal event processing (Verhagen et al., 2010) is related to <strong>the</strong> analysis<br />

<strong>of</strong> time expressions which include, for instance: mentions <strong>of</strong> <strong>the</strong> days <strong>of</strong><br />

<strong>the</strong> week or months (e.g. Sunday or February), names <strong>of</strong> special days (e.g.<br />

Christmas, Valentine’s Day), relative expressions (e.g. in two months, next<br />

year), clock and calendar times (e.g. 17:00 P.M., 2012-09-25 ). This task is<br />

however out <strong>of</strong> <strong>the</strong> scope <strong>of</strong> this <strong>the</strong>sis.<br />

4. Template filling is <strong>the</strong> task <strong>of</strong> searching for required data in documents<br />

that describe stereotypical information and <strong>the</strong>n filling predefined slots with

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!