21.11.2014 Views

ayout 1 - EMBL Grenoble

ayout 1 - EMBL Grenoble

ayout 1 - EMBL Grenoble

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>EMBL</strong> Research at a Glance 2009<br />

Vertebrate genomics<br />

Paul Flicek<br />

DSc 200, Washington<br />

University, St. Louis,<br />

Missouri.<br />

At <strong>EMBL</strong>-EBI since 2005.<br />

Team leader at <strong>EMBL</strong>-EBI<br />

since 2008.<br />

Previous and current research<br />

The Vertebrate Genomics team is a combined service and research group that creates and manages<br />

data resources focussing on genome annotation and human variation. The major service projects<br />

of the Vertebrate Genomics team are Ensembl, the European Genotype Archive, and the Data<br />

Coordination Centre for the 1000 Genomes Project. In support of these projects, we are developing<br />

the specialised, large-scale bioinformatics infrastructure required for each analysis. The team’s<br />

research is on computational genome annotation with a particular focus on the integration of diverse<br />

data types such as extensive comparative sequencing, DNA–protein interactions, epigenetic<br />

modifications, and the DNA sequence itself.<br />

Ensembl (www.ensembl.org) is a comprehensive genome information system featuring an integrated<br />

set of tools for genome annotation, data mining and visualisation of chordate genomes. As<br />

such, it is one of the fundamental database resources used to address questions in medical research<br />

and molecular biology. As of August 2008, there were 39 fully-supported genomes in Ensembl<br />

including human, mouse, chicken, five species of fish, a nematode, and several other<br />

mammalian, chordate and insect species.<br />

The European Genotype Archive (EGA) database provides a permanent archive for all types of personally identifiable genetic data including<br />

genotypes, genome sequence and associated phenotype data. The EGA contains both data collected from individuals whose consent agreements<br />

stipulate data release for specific approved research uses or bona fide researchers, as well as data approved for full public release.<br />

The 1000 Genomes Project (www.1000genomes.org) aims to create a comprehensive and public catalogue of common human genetic variation<br />

in three populations by using next-generation sequencing technology. During 2008, the project conducted three pilot projects to assess<br />

the feasibly of creating a deep and accurate catalogue and develop the necessary tools to manage and analyse the data. The pilot projects included<br />

the sequencing of 180 individuals to 2x coverage; sequencing two trios consisting of a child and both parents to 20x coverage; and targeted<br />

sequencing of 1,000 genes in 1,000 individuals.<br />

In collaboration with the NCBI, the Vertebrate Genomics team is one half of the 1000 Genomes Project Data Coordination Centre (DCC)<br />

and has co-leadership of the project’s data flow group. Over the course of the year the project produced approximately 2 terabases of sequence<br />

(equivalent to 8.5 times the number of nucleotides in the <strong>EMBL</strong>-Bank sequence archive) at a rate approaching 30 gigabases per day. This data<br />

is collected by the DCC and made available to the 1000 Genome Project analysis group and interested researchers worldwide.<br />

Future projects and goals<br />

Next-generation sequencing methods are having a profound impact.<br />

For example, we have been investigating ways to use short read<br />

transcriptome data in our automatic annotation to support the substantial<br />

amounts of data we expect in the future. The availability of<br />

an increasing number of genome sequences is challenging the comparative<br />

genomics aspects of the team’s work both in terms of scale<br />

and complexity. ENCODE and the 1000 Genomes Project will respectively<br />

provide significant new data into the functional genomics<br />

and variation resources. Future developments for the EGA include<br />

a suite of customised data mining tools, an analysis pipeline infrastructure<br />

supporting uniform analysis of the data in the archive,<br />

and the development (in collaboration with international partners)<br />

of standards for the exchange of genotype data including whole<br />

genome sequences.<br />

An example GenomeView from the European Genotype Archive showing<br />

genomic regions that are significantly associated with type I diabetes.<br />

Selected references<br />

Flicek, P. et al. (2008). Ensembl 2008. Nucleic Acids Res., 36<br />

(Database issue): D707-D71<br />

Johnson, D.S. et al. (2008). Systematic evaluation of variability in<br />

ChIP-chip experiments using predefined DNA targets. Genome Res.,<br />

18, 393-03<br />

Saar, K. et al. (2008). SNP and haplotype mapping for genetic<br />

analysis in the rat. Nat. Genet., 0, 560-566<br />

Warren, W.C. et al. (2008). Genome analysis of the platypus reveals<br />

unique signatures of evolution. Nature, 53, 175-183<br />

Flicek, P. (2007). Gene prediction: compare and CONTRAST.<br />

Genome Biol., 8, 233<br />

7

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!