Automatic functional annotation of predicted active sites - European ...
Automatic functional annotation of predicted active sites - European ...
Automatic functional annotation of predicted active sites - European ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
For example, ”hunchback” is a protein in Drosophila, while it is also a general English<br />
term. Furthermore, protein names consist mostly <strong>of</strong> multiple words, e.g. ”Rho-like protein”<br />
or ”HIV-1 envelope glycoprotein gp120”. An ER system needs to identify all the<br />
constituents <strong>of</strong> a protein name in order to relate the detected entity to its reference entry<br />
in a database. The BioCreAtIvE challenge addressed this problem with the 1B subtask;<br />
the target is the identification <strong>of</strong> protein/gene names in text, and the <strong>annotation</strong> <strong>of</strong> their<br />
correct gene identifier. Various solutions were published ranging from rule-based methods<br />
[HFM + 05] [TW02] [Fuk98] to machine learning approaches [CMP05]. The developed<br />
methods are, in general, reusable for any other biological entity recognition or terminology<br />
identification problem.<br />
Works have also been published that focused on the extraction <strong>of</strong> protein point mutations<br />
[RSMA + 04] [HLC04] [BW05] [LHC07] [YLPV07], which is one category <strong>of</strong> protein<br />
residue terminology. Other categories are residue sequence or residue interaction pair.<br />
The most widely adopted method to identify these terminologies is the design <strong>of</strong> regular<br />
expression patterns.<br />
2.3.2 Biological relation extraction<br />
Relation extraction (RD) aims to find associations between entities, or between an entity<br />
and a terminology within a text phrase. One objective in biomedical information<br />
extraction is the mining <strong>of</strong> biological facts from text. An example <strong>of</strong> biological fact is<br />
the semantic relation between two biological entities, such as protein-protein interaction<br />
[TOT04].<br />
Until now, three strategies have been investigated for biological relation extraction: the<br />
co-occurrence based analysis [LC05] [SB05], pattern-based approach [HZH + 04] [LCM03],<br />
and machine learning based methods [BM05] [BM06]. The common limitation <strong>of</strong> all <strong>of</strong><br />
these extraction systems is, that only the relation targets, e.g. proteins within a proteinprotein<br />
interaction, are extracted. By no means are contextual information considered in<br />
39