24.10.2014 Views

Automatic functional annotation of predicted active sites - European ...

Automatic functional annotation of predicted active sites - European ...

Automatic functional annotation of predicted active sites - European ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

For example, ”hunchback” is a protein in Drosophila, while it is also a general English<br />

term. Furthermore, protein names consist mostly <strong>of</strong> multiple words, e.g. ”Rho-like protein”<br />

or ”HIV-1 envelope glycoprotein gp120”. An ER system needs to identify all the<br />

constituents <strong>of</strong> a protein name in order to relate the detected entity to its reference entry<br />

in a database. The BioCreAtIvE challenge addressed this problem with the 1B subtask;<br />

the target is the identification <strong>of</strong> protein/gene names in text, and the <strong>annotation</strong> <strong>of</strong> their<br />

correct gene identifier. Various solutions were published ranging from rule-based methods<br />

[HFM + 05] [TW02] [Fuk98] to machine learning approaches [CMP05]. The developed<br />

methods are, in general, reusable for any other biological entity recognition or terminology<br />

identification problem.<br />

Works have also been published that focused on the extraction <strong>of</strong> protein point mutations<br />

[RSMA + 04] [HLC04] [BW05] [LHC07] [YLPV07], which is one category <strong>of</strong> protein<br />

residue terminology. Other categories are residue sequence or residue interaction pair.<br />

The most widely adopted method to identify these terminologies is the design <strong>of</strong> regular<br />

expression patterns.<br />

2.3.2 Biological relation extraction<br />

Relation extraction (RD) aims to find associations between entities, or between an entity<br />

and a terminology within a text phrase. One objective in biomedical information<br />

extraction is the mining <strong>of</strong> biological facts from text. An example <strong>of</strong> biological fact is<br />

the semantic relation between two biological entities, such as protein-protein interaction<br />

[TOT04].<br />

Until now, three strategies have been investigated for biological relation extraction: the<br />

co-occurrence based analysis [LC05] [SB05], pattern-based approach [HZH + 04] [LCM03],<br />

and machine learning based methods [BM05] [BM06]. The common limitation <strong>of</strong> all <strong>of</strong><br />

these extraction systems is, that only the relation targets, e.g. proteins within a proteinprotein<br />

interaction, are extracted. By no means are contextual information considered in<br />

39

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!