Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>EMBL</strong> Research at a Glance 2009<br />
Ewan Birney<br />
PhD 2000, Sanger Institute,<br />
Hinxton, Cambridge.<br />
Team leader at <strong>EMBL</strong>-EBI<br />
since 2000.<br />
PANDA nucleotides and methods for genome<br />
analysis<br />
Previous and current research<br />
the way our DNA archival services operate and more focus on coordinating<br />
with genomic resources.<br />
In addition, the Birney research group focusses on DNA sequence interpretation.<br />
There are two major themes to this research. The first is algorithm<br />
development. There have been a number of algorithmic developments in the<br />
Birney group, in particular on sequence alignment methods (Slater & Birney,<br />
BMC Bioinformatics), multiple alignments (Paten et al., Genome Research)<br />
and on de novo assembly using short reads (Zerbino & Birney,<br />
Genome Research). The second is on data-driven discovery of important genomic<br />
features in the genome. This includes large projects, such as the EN-<br />
CODE project (The ENCODE Consortium, Nature), which involves a large<br />
number of experimental groups focussing on the interpretation of genomic<br />
information, particular from non-coding DNA sequence. Integration across<br />
different data types provides new insights, for example, the surprising lack of<br />
correlation of conservation with experimentally-assayed function. There are<br />
also more specific, focussed projects, such as the exploration of cis-regulation<br />
in vertebrates (Ettwiller et al., Genome Biology) in which specific new data<br />
discovery technique are developed to elucidate genomic function.<br />
Future projects and goals<br />
Ewan Birney is joint head of the PANDA team, with Rolf Apweiler, and has a strategic oversight<br />
of the major DNA projects: Ensembl, Ensembl Genomes and the European Nucleotide Archive<br />
(ENA). These are large projects all dealing with DNA sequence information in a variety of forms,<br />
in particular in the annotation and interpretation of genomes. DNA sequence remains at the heart<br />
of molecular biology and hence bioinformatics and its use has grown significantly with the recent<br />
advent of ultra-high throughput DNA sequencing machines. In 2008 we have seen a striking<br />
growth in two areas – the use of these new machines for surveying natural variation in populations,<br />
in particular the human population and the more routine determination of genotypes from large<br />
disease cohorts, leading to associations between genetics and disease. The shift in technology and<br />
the repositioning of genomic information as a key organisation principal has meant that there<br />
have been significant changes to<br />
Figure showing the expression of synthetic enhancers designed<br />
using algorithms from the cis-regulatory research performed in<br />
the group. The arrows show tissue specific expression in<br />
medaka fish embryos from these ab initio designed enhancers.<br />
Future research continues both of these themes – algorithm development<br />
and data-driven discovery, both relating to genomic DNA sequence, but will<br />
also add the use of intra-species variation (i.e. natural variation in a population)<br />
with molecular markers as a component. Leveraging the natural polymorphisms in different populations allows us to understand how<br />
molecular function varies between individuals, and how this variation is correlated to the genotype of each individual. In the context of the<br />
human genome, very often this is done in the context of specific diseases, so one has genotype, functional information and disease status. In<br />
other organisms (for example, rodents), one has more controlled phenotype measurement at the organism level, allowing more complex scenarios<br />
to be explored.<br />
Selected references<br />
Ettwiller, L. et al. (2008). Analysis of mammalian gene batteries<br />
reveals both stable ancestral cores and highly dynamic regulatory<br />
sequences. Genome Biol., 9, R172<br />
Paten, B. et al. (2008). Enredo and Pecan: genome-wide mammalian<br />
consistency-based multiple alignment with paralogs. Genome<br />
Res.,18, 181-1828<br />
Zerbino, D.R. & Birney, E. (2008). Velvet: Algorithms for de novo<br />
short read assembly using de Bruijn graphs. Genome Res., 18, 821-<br />
829<br />
The ENCODE Consortium (2007). Identification and analysis of<br />
functional elements in 1% of the human genome by the ENCODE<br />
pilot project. Nature, 7, 799-816<br />
Slater, G.S. & Birney, E. (2005). Automated generation of heuristics<br />
for biological sequence comparison. BMC Bioinformatics, 6, 31<br />
72