17.07.2015 Views

PDF file: EURASNET Annual Report 2008

PDF file: EURASNET Annual Report 2008

PDF file: EURASNET Annual Report 2008

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

pathways (eg, Reactome ).While PANDA has a large number of specific projects ,each with their own collaborators, data flows and distributions, there is considerable coordination of informationacross the entire PANDA group - for example, new DNA sequences deposited in the EMBL archive provide thefoundation for genome databases, which in turn provides protein sequence for protein predictions; furthermorethe curation of protein information informs the gene structure predictions on genomes and the association ofgenes to function. The requirement for extensive linking of both the underlying data and the coordination ofweb resources is the driving force behind having one overarching group.The philosophy of the PANDA team is to capture, organise and interpret sequence-related data,providing all information in a variety of formats, including user friendly web sites. Wherever possible, we aimto collaborate with other groups worldwide with similar aims.As a consequence of the reorganization, the PANDA group is moving towards a unified presentation ofall the data, one web site allowing users to move from genome to transcriptome to proteome. Ensembl is tobecome the consolidated view, with the integration of sequence database resources Ensembl will growenormously in terms of organism coverage and depth of annotations.To achieve this augmentation of the transcript data in Ensembl, ASTD is to be integrated. ASTD willbe providing the expertise to generate full length transcript sequences from the 5’ transcription start site to the 3’poly(A) signal, annotation of each transcript for expression states using eVOC and MESH data andcomprehensive splice event annotations. These are currently not available in Ensembl, so ASTD is able toprovide new technology thereby increasing the knowledge within Ensembl.Future work for the ASTD team is a close examination and comparison of our gene-building pipeline with thatfrom Ensembl. Extensive benchmarking of exon coordinates on the genome is being performed to assess thebenefits and drawbacks of each method, so that the combined procedure will delineate the optimal gene modelusing all available data.Within the new Ensembl Genomes structure, the ASTD data for the human, mouse and rat transcriptomes willbe held within the chordate Ensembl section. The next species that will be targeted for development are withinthe non-chordate Ensembl Metazoa section, namely the twelve Drosophila genomes that have recently beenmade publicly available (Drosophila 12 Genomes Consortium, 2007). The inclusion of these species would bebeneficial to the research community providing a similar ‘look and feel’ to each genome with extensive linkingbetween the species. Comparative analysis of the genomes is performed at the DNA-DNA level (whole genomealignments and identification of constrained elements) and Protein-Protein level (determination of orthologs andparalogs within ‘genetrees’). The Ensembl web code allows the visualization of the comparative data linkingrelated regions of the genomes and a comprehensive Perl Application Programme Interface (API) providesefficient access for script based data retrieval and processing.Ensembl acts both as a Distributed Annotation System (DAS) server and client. This means that data fromgeographically distant servers can be displayed in the genome browser (e.g. the ContigView pages showing aregion of the genome with feature tracks; Genes, similarities, microarray probes etc). When viewing a region ofthe genome the browser will send a query to the DAS servers requesting any features that map to the regionbeing displayed. Users can define which DAS sources they would like to see and so the system is highlyconfigurable. This system could be used to visualise data from consortium members in the context of theirgenome of interest. DAS tracks are a simple way of displaying data that can be used both ‘in house’ for privatedata or publicly by registering the DAS server for third-parties to access. Each group retains responsibility forthe data and future updates. Setting up a DAS server is relatively straightforward and training by EBI personnelcan be provided for this service.A further database that is to be integrated into the Ensembl schema is Integr8. Currently the twodatabases have a very different emphasis on their chosen species: Ensembl are metazoan (Release 47 of October2007 contains 35 species, with preliminary support for six additional species) and Integr8 are bacterial. Socombining these resources immensely broadens the species coverage available.209

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!