You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Structural and Computational Biology Unit<br />
Data integration and knowledge management<br />
Previous and current research<br />
Today it is widely recognised that a comprehensive integration of data can be one of the key factors<br />
to improve productivity and efficiency in the biological research process. Successful data integration<br />
helps researchers to discover relationships that enable them to make better and faster<br />
decisions, thus considerably saving time and money.<br />
Over the last 20 years, biological research has seen a very strong proliferation of data sources. Each<br />
research group and new experimental technique generates a source of valuable data. The creation,<br />
use, integration and warehousing of biological data is central to large-scale efforts in understanding<br />
biological systems. These tasks pose significant challenges from the standpoint of data storage,<br />
indexing, retrieval and system scalability<br />
over disparate types of data.<br />
Examples of the graphical features of Arena3D.<br />
Heterogeneous data types can be visualised in<br />
a 3D environment and a range of l<strong>ayout</strong> and<br />
cluster algorithms can be applied.<br />
easily mined, browsed and navigated. By providing access to all scientists<br />
in the organisation, it will foster collaborations between researchers in different<br />
cross-functional groups.<br />
The group is involved in the following areas:<br />
• Data schema design and technical implementation;<br />
• Metadata annotation with respect to experimental data;<br />
• Design and implementation of scientific data portals;<br />
• Providing access to, and developing further, data-mining tools<br />
(e.g. text-mining);<br />
• Visualisation environment for systems biology data.<br />
Reinhard<br />
Schneider<br />
PhD 199, University of<br />
Heidelberg.<br />
Postdoctoral research at<br />
<strong>EMBL</strong>.<br />
The current systems biology approaches<br />
are generating data sets with<br />
Co-founder and Chief<br />
Information Officer at LION<br />
rapidly growing complexity and dynamics.<br />
One major challenge is to<br />
bioscience AG.<br />
Chief Executive Officer at<br />
provide the mechanism for accessing LION bioscience Research<br />
the heterogeneous data and to detect<br />
the important information. We develop<br />
interactive visual data analysis<br />
Inc., Cambridge, MA.<br />
Team leader at <strong>EMBL</strong> since<br />
200.<br />
techniques using automatic data<br />
analysis pipelines. The combination of<br />
techniques allows us to analyse otherwise unmanageable amounts of complex data.<br />
The principal aim of the group is to capture and centralise the knowledge generated<br />
by the scientists in the several divisions, and to organise that knowledge such<br />
that it can be<br />
Future projects and goals<br />
Our goal is to develop a comprehensive knowledge platform for the life<br />
sciences. We will first focus on the biology-driven research areas, but will<br />
extend into chemistry-related fields, preliminary by collaborating with groups<br />
inside <strong>EMBL</strong>. Other research areas will include advanced data-mining and visualisation<br />
techniques.<br />
OnTheFly and Reflect server. Figure (A,B,C) shows an<br />
annotated table (A) of an PDF full text article, the generated<br />
popup window with information about the protein YGL227W<br />
(B), and an automatically generated protein-protein interaction<br />
network (C) of associated entities for the proteins shown in part<br />
(A). Part (D) shows the architecture and functionality.<br />
Selected references<br />
Pavlopoulos, G.A., O’Donoghue, S.I., Satagopam, V.P., Soldatos,<br />
T.G., Pafilis, E. & Schneider, R. (2008). Arena3D: visualization of<br />
biological networks in 3D. BMC Syst. Biol., 2, 10<br />
Erhardt, R.A., Schneider, R. & Blaschke, C. (2006). Status of textmining<br />
techniques applied to biomedical text. Drug Discov. Today,<br />
11, 315-325<br />
Kremer, A., Schneider, R. & Terstappen, G.C. (2005). A<br />
bioinformatics perspective on proteomics: data storage, analysis,<br />
and integration. Biosci. Rep., 25, 95-106<br />
Ofran, Y., Punta, M., Schneider, R. & Rost, B. (2005). Beyond<br />
annotation transfer by homology: novel protein-function prediction<br />
methods to assist drug discovery. Drug Discov. Today, 10, 175-<br />
182<br />
51