15.01.2013 Views

Bioinformatics Biocomputing - Ercim

Bioinformatics Biocomputing - Ercim

Bioinformatics Biocomputing - Ercim

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

SPECIAL THEME: BIOINFORMATICS<br />

Co-operative Environments for Genomes Annotation:<br />

from Imagene to Geno-Annot<br />

by Claudine Médigue, Yves Vandenbrouke, François Rechenmann and Alain Viari<br />

‘Imagene’ is a a co-operative computer environment<br />

for the annotation and analysis of genomic sequences<br />

developed in collaboration between INRIA, Université<br />

Paris 6, Institut Pasteur and the ILOG company. The<br />

first version of this software was dedicated to bacterial<br />

In the context of large-scale genomic<br />

sequencing projects the need is growing<br />

for integration of specific sequence<br />

analysis tools within data management<br />

systems. With this aim in view, we have<br />

developed the Imagene co-operative<br />

computer environment dedicated to<br />

automatic sequence annotation and<br />

analysis (http://abraxa.snv.jussieu.fr/<br />

imagene). In this system, biological<br />

knowledge produced in the course of a<br />

genome sequencing project (putative<br />

genes, regulatory signals, etc) together<br />

with the methodological knowledge,<br />

represented by an extensible set of<br />

sequence analysis methods, are uniformly<br />

represented in an object oriented model.<br />

Imagene is the result of a five years<br />

collaboration between INRIA, Université<br />

Paris 6, the Institut Pasteur and the ILOG<br />

company. The system has been<br />

implemented by using an object oriented<br />

model and a co-operative solving engine<br />

provided by ILOG. In Imagene, a global<br />

problem (task) is solved by successive<br />

decompositions into smaller sub-tasks.<br />

During the execution, the various subtasks<br />

are graphically displayed to the user.<br />

In that sense, Imagene is more transparent<br />

to the user than a traditional menu-driven<br />

package for sequence analysis since all<br />

the steps in the resolution are clearly<br />

identified. Moreover, once a task has been<br />

solved, the user can restart it at any point;<br />

the system then keeps track of the<br />

different versions of the execution. This<br />

allows to maintain several hypothesis in<br />

parallel during the analysis. Imagene also<br />

provides a user interface to display, on<br />

the same picture, the results produced by<br />

one or several strategies (see Figure). Due<br />

to the homogeneity of the whole software,<br />

this display is fully interactive and the<br />

graphical objects are directly connected<br />

to their database counterpart.<br />

Imagene has been used within several<br />

bacterial genome sequencing projects<br />

(Bacillus subtilis and Mycoplasma<br />

pulmonis) and has proved to be<br />

particularly useful to pinpoint sequencing<br />

errors and atypical genes. However this<br />

first version suffers several drawbacks.<br />

First it was limited to the representation<br />

of prokaryotic data only, second the<br />

development tools were commercial thus<br />

giving rise to difficulties in its diffusion,<br />

last, it was designed to handle pure<br />

sequence data from a single genome. In<br />

order to overcome these limitations, we<br />

undertook a new project (Geno-Annot)<br />

through a collaboration between INRIA,<br />

the Institut Pasteur and the Genome-<br />

Express biotech compagny. As a first step,<br />

the data model was extended to eukaryotes<br />

and completely re- implemented using the<br />

AROM system developed at INRIA<br />

(http://www.inrialpes.fr/romans/pub/arom).<br />

We are now in the process of re-designing<br />

chromosomes. Its capabilities are currently extended<br />

to handle both prokaryotic and eukaryotic data and<br />

to link pure genomic data to ‘post-genomic’ data,<br />

particularly metabolic and gene expression data.<br />

Imagene view of a fragment of the B. subtilis chromosome: The display superimposes the<br />

output of several methods. Red boxes represent putative protein coding region (gene); the<br />

blue boxes represent the result of a data bank similarity scan (here the Blastx program); the<br />

yellow curve represents the coding probability as evaluated by using a Markov chain. The<br />

translated protein sequence of the currently selected gene is shown in the insert.<br />

the task-engine and the graphical user<br />

interfaces in JAVA. Finally, our ultimate<br />

goal will be to integrate Geno-Annot<br />

within a more general environment<br />

(called Geno-*) in order to fully link all<br />

the pieces of genomic information<br />

together (ie sequence data, metabolism,<br />

gene expression etc). Geno-Annot is a<br />

two years project that started in<br />

September 1999.<br />

Links:<br />

Action Helix:<br />

http://www.inrialpes.fr/helix.html<br />

Imagene:<br />

http://abraxa.snv.jussieu.fr/imagene<br />

Please contact:<br />

Alain Viari – INRIA<br />

Tel: +33 4 76 61 54 74<br />

E-mail: alain.viari@inrialpes.fr<br />

22 ERCIM News No. 43, October 2000

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!