You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>EMBL</strong> Research at a Glance 2009<br />
Christoph<br />
Steinbeck<br />
PhD 1995, Rheinische<br />
Friedrich-Wilhelm-Universität,<br />
Bonn.<br />
Postdoctoral research at<br />
Tufts University, Boston and<br />
the Max-Planck-Institute of<br />
Chemical Ecology, Jena,<br />
Germany, 1997-2002.<br />
Habilitation, 2003, Organic<br />
Chemistry, Friedrich-Schiller-<br />
Universität, Jena, Germany,<br />
2003.<br />
Head of Research Group for<br />
Molecular Informatics,<br />
Cologne University<br />
Bioinformatics Center<br />
(CUBIC), 2002-2007.<br />
Lecturer in<br />
Chemoinformatics, University<br />
of Tübingen, 2007.<br />
Team leader at <strong>EMBL</strong>-EBI<br />
since 2008.<br />
Chemoinformatics and metabolism<br />
Previous and current research<br />
The Chemoinformatics and Metabolism team aims to provide the biomedical community with<br />
information on small molecules and their interplay with biological systems. The group develops<br />
methods to decipher, organise and publish the small molecule metabolic content of organisms. We<br />
develop tools to quickly determine the structure of metabolites by stochastic screening of large candidate<br />
spaces and enable the identification of molecules with desired properties. This requires algorithms<br />
for the prediction of spectroscopic and other physicochemical properties of chemical<br />
graphs based on machine learning and other statistical methods.<br />
We are further investigating the extraction of chemical knowledge from the printed literature by<br />
text and graph mining methods, improved dissemination of information in life science publications,<br />
as well as open chemoinformatics workflow systems. Together with an international group<br />
of collaborators we develop the Chemistry Development Kit (CDK), the leading open source library<br />
for structural chemoinformatics as well as the chemoinformatics subsystem of Bioclipse, an<br />
award-winning rich client for chemo- and bioinformatics.<br />
Future projects and goals<br />
ChEBI datasets to aid the human curators. Last but not least, 2009<br />
will reveal the EBI’s solution on how to integrate the chemogenomic<br />
data with existing chemical resources at the institute.<br />
The recently acquired resource of large-scale drug activity data at the EBI creates exciting new opportunities<br />
both on the research and service side (www.ebi.ac.uk/Information/News/<br />
pdf/Press23July08.pdf). Our team has started to create an open source chemical search engine for<br />
the new resource, which will be the first open source chemistry search engine for the widely used<br />
OracleTM Database system. A combination of the new chemogenomics data and the Chemistry<br />
Development Kit will allow us to create open structure-activity models and to assist efforts in wet<br />
lab screening in areas such as library design.<br />
On the service side, ensuring a sustainable growth for the ChEBI database will be the focus of our<br />
attention. The number of marketed and developed drugs in the world drug index alone currently<br />
amounts to more than 80,000 compounds. Assuming only a handful of metabolites are produced<br />
by organisms upon application of these drugs, the task ahead takes shape. Not only does this task<br />
require a larger team for data collection and curation but also research into the automated assembly<br />
and validation of<br />
Computer-Assisted Structure Elucidation uses a structure generation<br />
engine to produce chemical spaces based on boundary conditions such<br />
as the gross formula of the unknown compound, determined for instance<br />
by mass spectrometry. These chemical spaces are then crawled and<br />
candidate structures in them inspected for fitness by comparing predicted<br />
and measured properties such as NMR spectra. Based on calculated<br />
fitness values, a ranking is presented to the user.<br />
Selected references<br />
Kuhn, S. et al. (2008). Building blocks for automated elucidation of<br />
metabolites: Machine learning methods for NMR prediction. BMC<br />
Bioinformatics, 9, 00<br />
Willighagen, E.L. et al. (2007). Userscripts for the life sciences. BMC<br />
Bioinformatics, 8, 87<br />
Han, Y.Q. & Steinbeck, C. (200). Evolutionary-algorithm-based<br />
strategy for computer-assisted structure elucidation. J. Chem. Inf.<br />
Com. Sci., , 89-98<br />
Steinbeck, C. (2001). SENECA: A platform-independent, distributed,<br />
and parallel system for computer-assisted structure elucidation in<br />
organic chemistry. J. Chem. Inf. Com. Sci., 1, 1500-1507<br />
8