21.11.2014 Views

ayout 1 - EMBL Grenoble

ayout 1 - EMBL Grenoble

ayout 1 - EMBL Grenoble

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>EMBL</strong>-EBI<br />

The Microarray Software Development Team<br />

Previous and current research<br />

Our team has been developing software for ArrayExpress since 2001 (Sarkans et al., 2005). As of<br />

October 2008 ArrayExpress holds data from almost 200,000 microarray hybridisations and is one<br />

of the major data resources of <strong>EMBL</strong>-EBI. The software development team has built the following<br />

components of the ArrayExpress infrastructure:<br />

• Repository – the archival MIAME-compliant database for the data that support publications;<br />

• Data Warehouse – a query-oriented database of gene expression profiles;<br />

• MIAMExpress – a data annotation and submission system;<br />

• Expression Profiler – a web-based data analysis toolset;<br />

• components used internally by the ArrayExpress production team.<br />

In 2008 we continued our efforts to rebuild the ArrayExpress infrastructure, with an emphasis on<br />

simplification of data management tasks and software components, tighter integration of data submission<br />

and management, and use of automated workflows for data processing.<br />

Ugis Sarkans<br />

PhD in Computer Science,<br />

University of Latvia, 1998.<br />

Postdoctoral research at the<br />

University of Wales,<br />

Aberystwyth.<br />

Technical team leader at<br />

<strong>EMBL</strong>–EBI since 2006.<br />

The team is also working on data management and integration solutions for domains beyond microarray data. We participate in the MolPAGE<br />

project, an integrated EU effort that aims to find biomarkers for genetic diseases, and type II diabetes in particular. Our team’s main achievements<br />

in this project are: the basic principles and architecture of a framework for managing human sample information and output of highthroughput<br />

analytical platforms; and a generic data reannotation, integration and warehousing solution.<br />

We also participated in data standardisation efforts, in particular MAGE (Spellman et al., 2002) and MAGE-TAB (Rayner et al., 2006) and<br />

regard this work as crucial for the successful development and support of a high-throughput data management infrastructure.<br />

Future projects and goals<br />

The main goal for 2009 is gradually rolling out the next-generation infrastructure for ArrayExpress. The main tasks of this project include:<br />

• replacing the central databases of ArrayExpress with a single unified database;<br />

• organising a system for synchronising the old and new databases, enabling testing and gradual switchover to the new infrastructure;<br />

• porting the existing workflows to the new system;<br />

• porting ArrayExpress interfaces to work with the new back-end.<br />

There are significant changes planned also from the perspective of external users. There is a clear conceptual separation between ArrayExpress<br />

Repository and ArrayExpress Warehouse/Atlas. The former serves the need to provide an archive of scientific record, working in tandem<br />

with journal publications, while the latter provides researchers with biologically meaningful information about expression of genes in<br />

various conditions – either in a more ‘raw’ form (as in the Warehouse), or already processed (as in the Atlas). However, there is no need to<br />

maintain a distinction between these resources from the web access point of view. We plan to build a unified interface that, in response to user<br />

queries, will provide information on all levels of granularity: whole experiments (Repository), sets of expression profiles (Warehouse), and<br />

interesting facts about gene expression (Atlas), providing users with a choice for further investigation.<br />

Selected references<br />

Haudry, Y. et al. (2008). DXpress: a database for cross-species<br />

expression pattern comparisons. Nucleic Acids Res., 36, D87-D853<br />

Jones, A.R. et al. (2007). The Functional Genomics Experiment<br />

model (FuGE): An extensible framework for standards in functional<br />

genomics. Nat. Biotechnol., 25, 1127-1133<br />

Rayner, T.F. et al. (2006). A simple spreadsheet-based, MIAMEsupportive<br />

format for microarray data: MAGE-TAB. BMC<br />

Bioinformatics, 7, 89<br />

Sarkans, U. et al. (2005). The ArrayExpress gene expression<br />

database: a software engineering and implementation perspective.<br />

Bioinformatics, 21, 195-1501<br />

Spellman, P.T. et al., (2002). Design and implementation of<br />

microarray gene expression markup language (MAGE-ML). Genome<br />

Biol., 3, RESEARCH006<br />

83

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!