Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>EMBL</strong>-EBI<br />
The Microarray Software Development Team<br />
Previous and current research<br />
Our team has been developing software for ArrayExpress since 2001 (Sarkans et al., 2005). As of<br />
October 2008 ArrayExpress holds data from almost 200,000 microarray hybridisations and is one<br />
of the major data resources of <strong>EMBL</strong>-EBI. The software development team has built the following<br />
components of the ArrayExpress infrastructure:<br />
• Repository – the archival MIAME-compliant database for the data that support publications;<br />
• Data Warehouse – a query-oriented database of gene expression profiles;<br />
• MIAMExpress – a data annotation and submission system;<br />
• Expression Profiler – a web-based data analysis toolset;<br />
• components used internally by the ArrayExpress production team.<br />
In 2008 we continued our efforts to rebuild the ArrayExpress infrastructure, with an emphasis on<br />
simplification of data management tasks and software components, tighter integration of data submission<br />
and management, and use of automated workflows for data processing.<br />
Ugis Sarkans<br />
PhD in Computer Science,<br />
University of Latvia, 1998.<br />
Postdoctoral research at the<br />
University of Wales,<br />
Aberystwyth.<br />
Technical team leader at<br />
<strong>EMBL</strong>–EBI since 2006.<br />
The team is also working on data management and integration solutions for domains beyond microarray data. We participate in the MolPAGE<br />
project, an integrated EU effort that aims to find biomarkers for genetic diseases, and type II diabetes in particular. Our team’s main achievements<br />
in this project are: the basic principles and architecture of a framework for managing human sample information and output of highthroughput<br />
analytical platforms; and a generic data reannotation, integration and warehousing solution.<br />
We also participated in data standardisation efforts, in particular MAGE (Spellman et al., 2002) and MAGE-TAB (Rayner et al., 2006) and<br />
regard this work as crucial for the successful development and support of a high-throughput data management infrastructure.<br />
Future projects and goals<br />
The main goal for 2009 is gradually rolling out the next-generation infrastructure for ArrayExpress. The main tasks of this project include:<br />
• replacing the central databases of ArrayExpress with a single unified database;<br />
• organising a system for synchronising the old and new databases, enabling testing and gradual switchover to the new infrastructure;<br />
• porting the existing workflows to the new system;<br />
• porting ArrayExpress interfaces to work with the new back-end.<br />
There are significant changes planned also from the perspective of external users. There is a clear conceptual separation between ArrayExpress<br />
Repository and ArrayExpress Warehouse/Atlas. The former serves the need to provide an archive of scientific record, working in tandem<br />
with journal publications, while the latter provides researchers with biologically meaningful information about expression of genes in<br />
various conditions – either in a more ‘raw’ form (as in the Warehouse), or already processed (as in the Atlas). However, there is no need to<br />
maintain a distinction between these resources from the web access point of view. We plan to build a unified interface that, in response to user<br />
queries, will provide information on all levels of granularity: whole experiments (Repository), sets of expression profiles (Warehouse), and<br />
interesting facts about gene expression (Atlas), providing users with a choice for further investigation.<br />
Selected references<br />
Haudry, Y. et al. (2008). DXpress: a database for cross-species<br />
expression pattern comparisons. Nucleic Acids Res., 36, D87-D853<br />
Jones, A.R. et al. (2007). The Functional Genomics Experiment<br />
model (FuGE): An extensible framework for standards in functional<br />
genomics. Nat. Biotechnol., 25, 1127-1133<br />
Rayner, T.F. et al. (2006). A simple spreadsheet-based, MIAMEsupportive<br />
format for microarray data: MAGE-TAB. BMC<br />
Bioinformatics, 7, 89<br />
Sarkans, U. et al. (2005). The ArrayExpress gene expression<br />
database: a software engineering and implementation perspective.<br />
Bioinformatics, 21, 195-1501<br />
Spellman, P.T. et al., (2002). Design and implementation of<br />
microarray gene expression markup language (MAGE-ML). Genome<br />
Biol., 3, RESEARCH006<br />
83