Bioinformatics Biocomputing - Ercim

More documents

Recommendations

Info

SPECIAL THEME: BIOINFORMATICS Co-operative Environments for Genomes Annotation: from Imagene to Geno-Annot by Claudine Médigue, Yves Vandenbrouke, François Rechenmann and Alain Viari ‘Imagene’ is a a co-operative computer environment for the annotation and analysis of genomic sequences developed in collaboration between INRIA, Université Paris 6, Institut Pasteur and the ILOG company. The first version of this software was dedicated to bacterial In the context of large-scale genomic sequencing projects the need is growing for integration of specific sequence analysis tools within data management systems. With this aim in view, we have developed the Imagene co-operative computer environment dedicated to automatic sequence annotation and analysis (http://abraxa.snv.jussieu.fr/ imagene). In this system, biological knowledge produced in the course of a genome sequencing project (putative genes, regulatory signals, etc) together with the methodological knowledge, represented by an extensible set of sequence analysis methods, are uniformly represented in an object oriented model. Imagene is the result of a five years collaboration between INRIA, Université Paris 6, the Institut Pasteur and the ILOG company. The system has been implemented by using an object oriented model and a co-operative solving engine provided by ILOG. In Imagene, a global problem (task) is solved by successive decompositions into smaller sub-tasks. During the execution, the various subtasks are graphically displayed to the user. In that sense, Imagene is more transparent to the user than a traditional menu-driven package for sequence analysis since all the steps in the resolution are clearly identified. Moreover, once a task has been solved, the user can restart it at any point; the system then keeps track of the different versions of the execution. This allows to maintain several hypothesis in parallel during the analysis. Imagene also provides a user interface to display, on the same picture, the results produced by one or several strategies (see Figure). Due to the homogeneity of the whole software, this display is fully interactive and the graphical objects are directly connected to their database counterpart. Imagene has been used within several bacterial genome sequencing projects (Bacillus subtilis and Mycoplasma pulmonis) and has proved to be particularly useful to pinpoint sequencing errors and atypical genes. However this first version suffers several drawbacks. First it was limited to the representation of prokaryotic data only, second the development tools were commercial thus giving rise to difficulties in its diffusion, last, it was designed to handle pure sequence data from a single genome. In order to overcome these limitations, we undertook a new project (Geno-Annot) through a collaboration between INRIA, the Institut Pasteur and the Genome- Express biotech compagny. As a first step, the data model was extended to eukaryotes and completely re- implemented using the AROM system developed at INRIA (http://www.inrialpes.fr/romans/pub/arom). We are now in the process of re-designing chromosomes. Its capabilities are currently extended to handle both prokaryotic and eukaryotic data and to link pure genomic data to ‘post-genomic’ data, particularly metabolic and gene expression data. Imagene view of a fragment of the B. subtilis chromosome: The display superimposes the output of several methods. Red boxes represent putative protein coding region (gene); the blue boxes represent the result of a data bank similarity scan (here the Blastx program); the yellow curve represents the coding probability as evaluated by using a Markov chain. The translated protein sequence of the currently selected gene is shown in the insert. the task-engine and the graphical user interfaces in JAVA. Finally, our ultimate goal will be to integrate Geno-Annot within a more general environment (called Geno-*) in order to fully link all the pieces of genomic information together (ie sequence data, metabolism, gene expression etc). Geno-Annot is a two years project that started in September 1999. Links: Action Helix: http://www.inrialpes.fr/helix.html Imagene: http://abraxa.snv.jussieu.fr/imagene Please contact: Alain Viari – INRIA Tel: +33 4 76 61 54 74 E-mail: alain.viari@inrialpes.fr 22 ERCIM News No. 43, October 2000
Arevir: Analysis of HIV Resistance Mutations by Niko Beerenwinkel, Joachim Selbig, Rolf Kaiser and Daniel Hoffmann To develop tools that assist medical practitioners in finding an optimal therapy for HIV-infected patients – this is the aim of a collaboration funded by Deutsche Forschungsgemeinschaft that has been started this The Human Immunodeficiency Virus (HIV) causes the Acquired Immunodeficiency Syndrome (AIDS). Currently, there are two types of drugs in the fight against HIV, namely inhibitors of the two viral enzymes protease (PR) and reverse transcriptase (RT). Since HIV shows a very high genomic variability, even under the usual combination therapy (HAART – highly active antiretroviral therapy) consisting of several drugs, mutations occur, that confer resistance to the prescribed drugs and even to drugs not yet prescribed (cross resistance). Therefore, the treating physician is faced with the problem of finding a new therapy rather frequently. Clinical trials have shown that therapy changes based on a genotypic resistance test, ie sequencing of the viral gene coding for PR and RT and looking for mutations known to cause resistance, result in a significantly better therapy success. However, not all patients benefit from Figure 1: Ribbon representation of the HIV protease homodimer (blue and green) complexed with an inhibitor (red), some resistance associated mutations (codon positions 10, 20, 36, 46, 48, 50, 54, 63, 71, 82, 84 and 90) are indicated in the ball-and-stick mode. ERCIM News No. 43, October 2000 therapy changes after resistance testing. There are several possible reasons for therapy failure in this situation: the occurrence of an HIV-strain resistant to all available antiretroviral drugs, no sufficient drug-level is reached in the patient, or the chosen drug-combination was not able to suppress the virus sufficiently. The latter occurs because the relations between observed mutations, phenotypic resistance and therapy success are poorly understood. While the PR inhibitors, for example, all bind in the catalytic center of this enzyme, mutations associated with resistance occur at many different locations spread all over the three-dimensional structure of the protein (see Figure 1). The relation between point mutations and drug resistance remains unclear in many cases, not to speak of the interpretation of more complex mutation patterns. The goal of the Arevir project is to develop bioinformatics methods that help SPECIAL THEME: BIOINFORMATICS year by researchers at GMD, the University of Cologne, CAESAR, the Center of Advanced European Studies and Research, Bonn, and a number of cooperating university hospitals in Germany. to understand these connections and that contribute directly to therapy optimization. In a first step a database is set up in collaboration with project partners from university hospitals and virological institutes, in which clinical data, sequence data and phenotypic resistance data are collected. These correlated data are used to learn about the outcome of a therapy as a function of the drugs making up the components of this therapy and the genotype of the two relevant enzymes PR and RT. A successful outcome of a therapy can be measured as a substantial reduction in virus load (ie the number of virus particles measured in the patients’ blood plasma; see Figure 2). We can formulate a classification problem on the set of all pairs consisting of the therapy’s drug components and the amino acid sequence of PR and RT assigning to each such pair either the class ‘successful’ or ‘unsuccessful’. Figure 2: Virus load and therapies of a patient in a time period of more than 100 weeks (the measured data points have been interpolated, values under the limit of detection have been normalized to this limit (400 copies/ml), drugs are encoded in a three letter code). 23
Page 1 and 2: European Research Consortium for In
Page 3 and 4: ERCIM News No. 43, October 2000 KEY
Page 5 and 6: Irish Government invests over €63
Page 7 and 8: genes that are measured today. The
Page 9 and 10: epresent false-positive or false-ne
Page 11 and 12: structure enables a very fast compa
Page 13 and 14: strategy that uses both is more ada
Page 15 and 16: ERCIM News No. 43, October 2000 SPE
Page 17 and 18: metabolic/regulatory network gene e
Page 19 and 20: system, for instance a response to
Page 21: ERCIM News No. 43, October 2000 SPE
Page 25 and 26: Tissue ratios in different brain re
Page 27 and 28: Combinatorial Algorithms in Computa
Page 29 and 30: A three-dimensional visualization o
Page 31 and 32: and this is common to the three pro
Page 33 and 34: Configurable DNA Computing by John
Page 35 and 36: medium scale combinatorial problems
Page 37 and 38: colony of bacteria we have an unlim
Page 39 and 40: (in some sense) that similar struct
Page 41 and 42: ERCIM News No. 43, October 2000 RES
Page 43 and 44: queries is achieved by specifying z
Page 45 and 46: conformance to an open architecture
Page 47 and 48: Photo: Voss Fotografie Identifying
Page 49 and 50: Scolar excursion. precision sizes r
Page 51 and 52: forty-four papers were presented at
Page 53 and 54: CALL FOR PAPERS IEA/AIE-2001 - The
Page 55 and 56: CWI Incubator BV helps researchers

Bioinformatics Biocomputing - Ercim

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?