Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>EMBL</strong> Research at a Glance 2009<br />
Peter Rice<br />
BSc 1976, University of<br />
Liverpool.<br />
Previously at <strong>EMBL</strong><br />
Heidelberg (1987–199), the<br />
Sanger Centre (199–2000)<br />
and LION Bioscience (2000–<br />
2002).<br />
Team leader at <strong>EMBL</strong>-EBI<br />
since 2003.<br />
Grid and e-Science research and development<br />
Previous and current research<br />
The team’s focus is on the integration of bioinformatics tools and data resources. We have the<br />
remit to investigate and advise on the e-Science and Grid technology requirements of <strong>EMBL</strong>-EBI,<br />
through application development, training exercises and participation in international projects<br />
and standards development. Our group is responsible for the EMBOSS open source sequence<br />
analysis package, the Taverna bioinformatics workflow system (originally developed as part of the<br />
myGrid UK e-Science project) and for various projects (including EMBRACE and ComparaGrid)<br />
that integrate access to bioinformatics tools and data content.<br />
To date, Grid development has focussed on the basic issues of storage, computation and resource<br />
management needed to make a global scientific community’s information and tools accessible in<br />
a high-performance environment. However, from the e-Science point of view, the purpose of the<br />
Grid is to deliver a collaborative and supportive environment that enables geographically distributed<br />
scientists to achieve research goals more effectively, while allowing their results to be used in<br />
developments elsewhere.<br />
Our group has been the biological specialist participant in the UK-funded myGrid project and<br />
this collaboration is continuing with the Open Middleware Infrastructure Institute (OMII-UK).<br />
This project was aimed at developing and maintaining open source high-level service-based middleware<br />
to support the construction, management and sharing of data-intensive in silico experiments<br />
in biology. <strong>EMBL</strong>-EBI’s role is through the Taverna workbench and as an application and data service developer and provider which<br />
continues through the EMBRACE and EMBOSS projects.<br />
A key factor in the success of EMBOSS, and in particular its selection as the application platform for the EMBRACE and myGrid projects,<br />
has been its development and implementation of the AJAX Command Definition standard or ACD files. These define the interface of each<br />
EMBOSS application and are directly used by the application on startup for all processing of the command line and interaction with the user.<br />
The EMBRACE project, an EU-funded Network of Excellence, is now in its second year, with the aim of defining and implementing a consistent<br />
standard interface to integrate data content and analysis tools across all <strong>EMBL</strong>-EBI core databases and those provided by our partners.<br />
The early focus of this five-year project has been on the sequence and structure data resources at EBI and the EMBOSS applications. Our group<br />
is also active in defining the core technologies to be used by EMBRACE, including BioMart data federation methods, web services provided<br />
by the EBI External Services team, and the Taverna workbench as an end-user client.<br />
Future projects and goals<br />
The services provided by the group remain largely SOAP-based web services. These have proved to be highly useful to prototype and develop<br />
service and metadata standards. We are looking, especially through the EMBRACE project, to migrate to true Grid services, but like many<br />
other groups we are waiting for the long-anticipated merging of web and grid service standards.<br />
The EMBOSS project plans to expand in the coming few years to cover bioinformatics more generally, including genomics, protein structure,<br />
gene expression, proteomics, phylogenetics, genetics and biostatistics. This will require the participation of external groups to expand the project<br />
beyond its current EBI base, and we are actively seeking potential partners in each area. We will expect to build a service-based e-Science<br />
architecture around the applications and data resources through the EMBRACE project, with support and guidance from the community of<br />
users in academia and industry.<br />
The EMBRACE project will move beyond sequence data and analysis services to cover the remaining areas of the EBI’s core databases and to<br />
integrate services from our partners using the same standards and interfaces.<br />
Selected references<br />
Belhajjame, K. et al. (2008). Metadata management in the taverna<br />
workflow system. In ‘Proceedings CCGRID 2008 – 8th IEEE<br />
International Symposium on Cluster Computing and the Grid’, 651-<br />
656<br />
Lanzen, A. & Oinn, T. (2008). The Taverna Interaction Service:<br />
Enabling manual interaction in workflows. Bioinformatics, 2, 1118-<br />
1120<br />
Li, P. et al. (2008). Automated manipulation of systems biology<br />
models using libSBML within Taverna workflows. Bioinformatics, 2,<br />
287-289<br />
Li, P. et al. (2008). Performing statistical analyses on quantitative<br />
data in Taverna workflows: An example using R and maxdBrowse to<br />
identify differentially-expressed genes from microarray data. BMC<br />
Bioinformatics, 9, Article 33<br />
82