21.11.2014 Views

ayout 1 - EMBL Grenoble

ayout 1 - EMBL Grenoble

ayout 1 - EMBL Grenoble

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>EMBL</strong> Research at a Glance 2009<br />

Ewan Birney<br />

PhD 2000, Sanger Institute,<br />

Hinxton, Cambridge.<br />

Team leader at <strong>EMBL</strong>-EBI<br />

since 2000.<br />

PANDA nucleotides and methods for genome<br />

analysis<br />

Previous and current research<br />

the way our DNA archival services operate and more focus on coordinating<br />

with genomic resources.<br />

In addition, the Birney research group focusses on DNA sequence interpretation.<br />

There are two major themes to this research. The first is algorithm<br />

development. There have been a number of algorithmic developments in the<br />

Birney group, in particular on sequence alignment methods (Slater & Birney,<br />

BMC Bioinformatics), multiple alignments (Paten et al., Genome Research)<br />

and on de novo assembly using short reads (Zerbino & Birney,<br />

Genome Research). The second is on data-driven discovery of important genomic<br />

features in the genome. This includes large projects, such as the EN-<br />

CODE project (The ENCODE Consortium, Nature), which involves a large<br />

number of experimental groups focussing on the interpretation of genomic<br />

information, particular from non-coding DNA sequence. Integration across<br />

different data types provides new insights, for example, the surprising lack of<br />

correlation of conservation with experimentally-assayed function. There are<br />

also more specific, focussed projects, such as the exploration of cis-regulation<br />

in vertebrates (Ettwiller et al., Genome Biology) in which specific new data<br />

discovery technique are developed to elucidate genomic function.<br />

Future projects and goals<br />

Ewan Birney is joint head of the PANDA team, with Rolf Apweiler, and has a strategic oversight<br />

of the major DNA projects: Ensembl, Ensembl Genomes and the European Nucleotide Archive<br />

(ENA). These are large projects all dealing with DNA sequence information in a variety of forms,<br />

in particular in the annotation and interpretation of genomes. DNA sequence remains at the heart<br />

of molecular biology and hence bioinformatics and its use has grown significantly with the recent<br />

advent of ultra-high throughput DNA sequencing machines. In 2008 we have seen a striking<br />

growth in two areas – the use of these new machines for surveying natural variation in populations,<br />

in particular the human population and the more routine determination of genotypes from large<br />

disease cohorts, leading to associations between genetics and disease. The shift in technology and<br />

the repositioning of genomic information as a key organisation principal has meant that there<br />

have been significant changes to<br />

Figure showing the expression of synthetic enhancers designed<br />

using algorithms from the cis-regulatory research performed in<br />

the group. The arrows show tissue specific expression in<br />

medaka fish embryos from these ab initio designed enhancers.<br />

Future research continues both of these themes – algorithm development<br />

and data-driven discovery, both relating to genomic DNA sequence, but will<br />

also add the use of intra-species variation (i.e. natural variation in a population)<br />

with molecular markers as a component. Leveraging the natural polymorphisms in different populations allows us to understand how<br />

molecular function varies between individuals, and how this variation is correlated to the genotype of each individual. In the context of the<br />

human genome, very often this is done in the context of specific diseases, so one has genotype, functional information and disease status. In<br />

other organisms (for example, rodents), one has more controlled phenotype measurement at the organism level, allowing more complex scenarios<br />

to be explored.<br />

Selected references<br />

Ettwiller, L. et al. (2008). Analysis of mammalian gene batteries<br />

reveals both stable ancestral cores and highly dynamic regulatory<br />

sequences. Genome Biol., 9, R172<br />

Paten, B. et al. (2008). Enredo and Pecan: genome-wide mammalian<br />

consistency-based multiple alignment with paralogs. Genome<br />

Res.,18, 181-1828<br />

Zerbino, D.R. & Birney, E. (2008). Velvet: Algorithms for de novo<br />

short read assembly using de Bruijn graphs. Genome Res., 18, 821-<br />

829<br />

The ENCODE Consortium (2007). Identification and analysis of<br />

functional elements in 1% of the human genome by the ENCODE<br />

pilot project. Nature, 7, 799-816<br />

Slater, G.S. & Birney, E. (2005). Automated generation of heuristics<br />

for biological sequence comparison. BMC Bioinformatics, 6, 31<br />

72

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!