bbc 2015

Recommendations

Info

BeNeLux Bioinformatics Conference – Antwerp, December 7-8 2015 Abstract ID: P Poster 10th Benelux Bioinformatics Conference bbc 2015 P54. TOWARDS A BELGIAN REFERENCE SET Erika Souche 1* , Amin Ardeshirdavani 2 , Yves Moreau 2 , Gert Matthijs 1 & Joris Vermeesch 1 . Department of Human Genetics, KU Leuven 1 ; ESAT-STADIUS Center for Dynamical Systems, Signal Processing and Data Analytic, KU Leuven 2 . * Erika.souche@uzleuven.be Next-Generation Sequencing (NGS) is increasingly used to study and diagnose human disorders. The simultaneous sequencing of a large number of genes leading to the detection of a large number of variants, the bottleneck has moved from sequencing to variant interpretation and classification. Although publically available databases of variant frequencies help distinguishing causative mutations from common variants, they often lack population specific variant frequencies. To circumvent this shortage of population specific information, most genetic centers exploit their sequence data of unrelated and unaffected individuals to filter out common local variants is often done. However the files/databases are rarely shared and they are mainly based on whole exome data. In this project we demonstrate the utility of a local variant database generated from whole exome data, describe a procedure allowing the sharing of information between genetic centers and mine low coverage whole genome data for common variants. INTRODUCTION Next-Generation Sequencing (NGS) is increasingly used to study and diagnose human disorders. The simultaneous sequencing of a large number of genes leading to the detection of a large number of variants, the bottleneck has moved from sequencing to variant interpretation and classification. Publically available databases of variant frequencies provided by, among others, the Exome Sequencing Project (ESP) the 1000 genomes project (McVean et al., 2012) or dbSNP (Sherry et al., 2001) help distinguishing causative mutations from common variants, identifying up to 78% of variants as common for a Belgian exome. However, these data sets often lack population specific variant frequencies and are outperformed by databases of local variants. For example, using GoNL (The Genome of the Netherlands Consortium, 2014) alone allowed the identification of up to 85% of variants as common for the same Belgian exome. The fact that the GoNL is based on only 498 individuals further highlights the importance of building and using population specific databases. Such population specific data can be retrieved from locally sequenced individuals that underwent Whole Exome Sequencing (WES) or Whole Genome Sequencing (WGS). Storing only the frequencies and genotype counts of the variants provides a valuable tool for variant classification while no sensitive information on the individuals is included. METHODS WES data of 350 unrelated and unaffected individuals have been parsed. All samples were analysed in a similar way i.e. reads were aligned to the reference genome with BWA (Li & Durbin, 2009) and genotyping was performed according to GATK best practices (McKenna et al., 2010; DePristo et al., 2011). All samples were genotyped at all polymorphic positions using GATK HaplotypeCaller and GenotypeGVCFs. For each position, samples with low quality genotype were considered as not genotyped and excluded from the genotype counts. The number of alternate alleles, allele counts and genotypes were compiled in a population VCF file, in which individual genotypes are not accessible. Variant frequencies can also be extracted from low coverage WGS. As a pilot we processed the data of chromosome 21 of about 4,000 WGS. The mapping was performed with BWA (Li & Durbin, 2009) and the BAM files were merged per 200 samples. All positions were genotyped using freebayes (Garrison & Marth, 2012). Genotype information of all locations outside low complexity regions were then compiled for all samples using the integration of Apache Hadoop, HBase and Hive (see poster “Big data solutions for variant discovery from low coverage sequencing data, by integration of Hadoop, Hbase and Hive”). Several models were then used to distinguish real variants from sequencing errors: the Minor Allele Frequency (MAF), the transition/transversion ratio, the expected number of loci with a MAF of 5%, etc. RESULTS & DISCUSSION We demonstrated the effect of our reference set on several exomes. The inclusion of only 350 individuals allowed the identification of about 3% additional common variants, not listed as common by ESP, dbSNP (Sherry et al., 2001), 1000 Genomes (McVean et al., 2012) and GoNL (The Genome of the Netherlands Consortium, 2014). Since only the frequencies of the variants in the screened populations are reported, this file can easily be shared between laboratories. Besides, the procedure used to generate the population VCF file can easily be applied to several genetic centers in order to generate a common population VCF file, as planned within the BeMGI project. Finally we expect that the data from WGS will further increase the performance of our reference set. A genomewide variant frequencies file from local population will become worthwhile when WGS is routinely used in diagnostics. REFERENCES DePristo M et al. Nature Genetics 43, 491-498 (2011). Exome Variant Server, NHLBI Exome Sequencing Project (ESP), Seattle, WA (URL: http://evs.gs.washington.edu/EVS/). Garrison E & Marth G http://arxiv.org/abs/1207.3907 (2012). Li H & Durbin R Bioinformatics 25, 1754-60 (2009). McKenna A et al. Genome Research 20, 1297-303 (2010). McVean et al. Nature 491, 56–65 (2012). Sherry ST, et al. Nucleic Acids Res. 29, 308-11 (2001). The Genome of the Netherlands Consortium. Nature Genetics 46, 818–825 (2014). 98
BeNeLux Bioinformatics Conference – Antwerp, December 7-8 2015 Abstract ID: P Poster 10th Benelux Bioinformatics Conference bbc 2015 P55. MANAGING BIG IMAGING DATA FROM MICROSCOPY: A DEPARTMENTAL-WIDE APPROACH Yves Sucaet 1* , Silke Smeets 1 , Stijn Piessens 1 , Sabrina D’Haese 1 , Chris Groven 1 , Wim Waelput 1 & Peter In’t Veld 1 . Department of Pathology 1 , Faculty of Medicine, Vrije Universiteit Brussel, Laarbeeklaan 103, 1090 Brussels, Belgium. * yves.sucaet@usa.net With recent breakthroughs in whole slide imaging (WSI), almost any microscopic material can be digitized in an efficient manner. In order to mine these data efficiently, a top-down approach was employed to manage various imaging platforms. At Brussels Free University (VUB), we built a centralized infrastructure that integrates a variety of imaging platforms (brightfield, fluorescence, multi-vendor formats). With the help of the Pathomation software platform for digital microscopy, various datastores and image repositories were integrated. Custom coding was used to interact with various vendor-software and server applications, where needed. The end-result is an interconnected network of heterogeneous scalable information silos. We currently have two main use cases for WSI: education and biobanking. These applications are available to the public via http://www.diabetesbiobank.org. INTRODUCTION Too often, image analysis and data/image mining projects remain stuck in micro-environments because they are limited by vendor-specific solutions that neither scale nor interact with material from other departments or institutions. Successful roll-out of digital histopathology therefore requires more than a whole slide scanner. If the goal is for an imaging facility to allow a researcher to conduct a (microscopic) experiment, then that researcher should not be hindered by the imaging platform used. Similarly, an instructor integrating digital content into his or her course, should be able to make their materials as accessible as possible to as many students as possible. At Brussels Free University (VUB), we currently have two main use cases for whole slide imaging: education and biobanking. We have set these up in such a way that they are both scalable and expandable. METHODS Whole slide imaging (WSI) has recently provided a boost to digital capturing of microscopic content (and an explosion of data, resulting in a veritable digital treasure trove waiting for bioinformatics to be explored). But researchers have been digitizing content for a long time already through various technologies (mounted cameras, inverted fluorescent microscopes with low magnification, …). We envisioned an environment whereby a researcher can manage and view all of the material related to an experiment or observation from a single interface, irrespective of origin or technology used. The following steps were taken to accomplish this: Setup a central server (50TB storage) Centrally store all imaging data provide mapped drives on the individual workstations to facilitate a smooth transition for end-users Install the Pathomation platform for digital microscopy (PMA.core, PMA.view, PMA.zui) for universal viewing of digital content and to provide a uniform end-user experience Install Pydio (open source) for easy sharing of digital imaging content (integrated with Pathomation’s PMA.core so no duplicate user directories need to be maintained) Build custom portals to highlight specific collections of microscopic content and/or serve specific target audiences RESULTS & DISCUSSION The centralized digital imaging infrastructure is used by various researchers and graduate students. Recently over 3,000 images were processed and hosted in the course of one month. Two use cases are worth highlighting: For undergraduate students (Medicine, BMS) we built custom portal websites to supplement their courses in histology and pathology. These sites are available at http://histology.vub.ac.be and http://pathology.vub.ac.be and provide students with (guided) virtual microscopy without the need to install any additional software We also provide access portals to different specialized biobanks. The Willy Gepts collection represents a historic milestone in diabetes research (http://gepts.vub.ac.be) and is complementary to the Alan Foulis collection (http://foulis.vub.ac.be). Furthermore, the clinical diabetes biobank can now be consulted online, too, via http://www.diabetesbiobank.org. CONCLUSION Digital histopathology has been around for some time now, but often results in heterogeneous data collections. It is only now that we start looking at integrated approaches on this varied data can be best handled. Digital pathology involves much more than the acquisition of a slide scanner. We have engaged five different imaging platforms onto a single architecture. We are storing data from all modalities in a single storage facility, and manage it through a single access point. The resulting environment assists in rendering content to any type of display device, without the need for extra software or background information concerning the content’s origin. 99
Page 1 and 2:
10 th Benelux Bioinformatics Confer
Page 3 and 4:
10th Benelux Bioinformatics Confere
Page 5 and 6:
Page 7 and 8:
Page 9 and 10:
Page 11 and 12:
Page 13 and 14:
Page 15 and 16:
Page 17 and 18:
Page 19 and 20:
BeNeLux Bioinformatics Conference -
Page 21 and 22:
Page 23 and 24:
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
Page 45 and 46:
Page 47 and 48: BeNeLux Bioinformatics Conference -
Page 97: BeNeLux Bioinformatics Conference -
Page 115: 10th Benelux Bioinformatics Confere
show all

bbc 2015

Create successful ePaper yourself

Delete template?

Save as template?