24.10.2014 Views

Automatic functional annotation of predicted active sites - European ...

Automatic functional annotation of predicted active sites - European ...

Automatic functional annotation of predicted active sites - European ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

equires the identification <strong>of</strong> chemical functions, which could be found in the context<br />

<strong>of</strong> residue mentions. MEDLINE abstracts have been processed to identify protein mentions<br />

in combination with species and residues (F1-measure 0.52; the F1-measure is a<br />

statistical measure <strong>of</strong> a test’s accuracy based on the precision and recall <strong>of</strong> a test). The<br />

identified protein-species-residue triplets have been validated and benchmarked against<br />

reference data resources. Then, contextual features were extracted through shallow and<br />

deep parsing and the features have been classified into predefined categories (F1-measure<br />

ranges from 0.15 to 0.67). Furthermore, the feature sets have been aligned with <strong>annotation</strong><br />

types in UniProtKB to assess the relevance <strong>of</strong> the <strong>annotation</strong>s for ongoing curation<br />

projects.<br />

Altogether, the <strong>annotation</strong>s have been assessed automatically and manually<br />

against reference data resources.<br />

All MEDLINE has been processed to filter out <strong>annotation</strong>s for residues. A subset <strong>of</strong><br />

identified catalytic <strong>sites</strong> could be cross-validated against the Catalytic Site Atlas (CSA;<br />

44 out <strong>of</strong> 221). 429 out <strong>of</strong> 512 protein residues from MSDsite was then annotated with<br />

contextual data. Altogether, MEDLINE does not provide sufficient data to fully annotate<br />

the content from PDB. Conversely, residue <strong>annotation</strong> is achieved with a different feature<br />

set than provided from GO, and incomplete <strong>annotation</strong>s in the reference datasets can be<br />

filled from public literature.<br />

3

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!