bbc 2015

Recommendations

Info

BeNeLux Bioinformatics Conference – Antwerp, December 7-8 2015 Abstract ID: O16 Oral presentation 10th Benelux Bioinformatics Conference bbc 2015 O16. BIOINFORMATICS TOOLS FOR ACCURATE ANALYSIS OF AMPLICON SEQUENCING DATA FOR BIODIVERSITY ANALYSIS Mohamed Mysara 1-3 , Yvan Saeys 4,5 , Natalie Leys 1 , Jeroen Raes 2,6 & Pieter Monsieurs 1* . Unit of Microbiology, Belgian Nuclear Research Centre SCK•CEN, Mol; Belgium 1; Department of Bioscience Engineering, Vrije Universiteit Brussel VUB, Brussels, Belgium 2 ; Department of Structural Biology, Vlaams Instituut voor Biotechnologie VIB, Brussels, Belgium 3 ; Data Mining and Modeling Group, VIB Inflammation Research Center, Ghent, Belgium 4 , Department of RespiratoryMedicine, Ghent University Hospital, Ghent, Belgium 5 , Department of Microbiology and Immunology, REGA institute, KU Leuven, Belgium 6 . * pmonsieu@sckcen.be High-throughput sequencing technologies have created a wide range of new applications, also in the field of microbial ecology. Yet when used in 16S rRNA biodiversity studies, it suffers from two important problems: the presence of PCR artefacts (called chimera) and sequencing errors resulting from the sequencing sequencing technologies. In this work three artificial intelligence-based algorithms are proposed, CATCh, NoDe and IPED, to handle these two problems. A benchmarking study was performed comparing CATCh/NoDe (for 454 pyrosequencing) or CATCh/IPED (for Illumina MiSeq sequencing) with other state-of-the art tools, showing a clear improvement in chimera detection and reduction of sequencing errors respectively, and in general leading to more accurate clustering of the sequencing reads in Operational Taxonomic Units (OTUs). All algorithms are available via http://science.sckcen.be/en/Institutes/EHS/MCB/MIC /Bioinformatics/. INTRODUCTION The revolution in new sequencing technologies has led to an explosion of possible applications, including new opportunities for microbial ecological studies via the usage of 16S rDNA amplicon sequencing. However, within such studies, all sequencing technologies suffer from the presence of erroneous sequences, i.e. (i) chimera, introduced by wrong target amplification in PCR, and (ii) sequencing errors originating from different factors during the sequencing process. As such, there is a need for effective algorithms to remove those erroneous sequences to be able to accurately assess the microbial diversity. METHODS First, a new algorithm called CATCh (Combining Algorithms to Track Chimeras) was developed by integrating the output of existing chimera detection tools into a new more powerful method. Second, NoDe (Noise Detector) was introduced, an algorithm that identifies and corrects erroneous positions in 454-pyrosequencing reads. Third, IPED (Illumina Paired End Denoiser) algorithm was developed to handle error correction in Illumina MiSeq sequencing data as the first tool in the field. After identifying those positions likely to contain an error, those sequencing reads are subsequently clustered with correct reads resulting in error-free consensus reads. The three algorithms were benchmarked with state-of-the-art tools. RESULTS & DISCUSSION Via a comparative study with other chimera detection tools, CATCh was shown to outperform all other tools, thereby increasing the sensitivity with up to 14% (see Figure 1). FIGURE 1. Plot indicating the effect of applying 5% indels (shown on the left) and 5% mismatches (shown on the right), on the performance of different chimera detection tools. CATCh was found to outperform other existing tools. Similarly, NoDe and IPED were benchmarked against other denoising algorithms, thereby showing a significant improvement in reduction of the error rate up to 55% and 75% respectively (see Figure 2). The combined effect of our algorithms for chimera removal and error correction also had a positive effect on the clustering of reads in operational taxonomic units (OTUs), with an almost perfect correlation between the number of OTUs and the number of species present in the mock communities. Indeed, when applying our improved pipeline containing CATCh and NoDe on a 454 pyrosequencing mock dataset, our pipeline could reduce the number of OTUs to 28 (i.e. close 18, the correct number of species). In contrast, running the straightforward pipeline without our algorithms included would inflate the number of OTUs to 98. Similarly, when tested on Illumina MiSeq sequencing data obtained for a mock community, using a pipeline integrating CATCh and IPED, the number of OTUs returned was 33 (i.e. close to the real number of 21 species), while 86 OTUs was obtained using the default mothur pipeline. REFERENCES Mysara M., Leys N., Raes J., Monsieurs P.- NoDe: a fast error-correction algorithm for pyrosequencing amplicon reads.- In: BMC Bioinformatics, 16:88(2015), p. 1-15.- ISSN 1471-2105 Mysara M., Saeys Y., Leys N., Raes J., Monsieurs P.- CATCh, an Ensemble Classifier for Chimera Detection in 16S rRNA Sequencing Studies.- In: Applied and Environmental Microbiology, 81:5(2015), p. 1573-1584.- ISSN 0099-2240 36
BeNeLux Bioinformatics Conference – Antwerp, December 7-8 2015 Abstract ID: O17 Oral presentation 10th Benelux Bioinformatics Conference bbc 2015 O17. GENE CO-EXPRESSION ANALYSIS IDENTIFIES BRAIN REGIONS AND CELL TYPES INVOLVED IN MIGRAINE PATHOPHYSIOLOGY: A GWAS- BASED STUDY USING THE ALLEN HUMAN BRAIN ATLAS Sjoerd M.H. Huisman 1,2* , Else Eising 3 , Ahmed Mahfouz 1,2 , Lisanne Vijfhuizen 3 , International Headache Genetics Consortium, Boudewijn P.F. Lelieveldt 2 , Arn M.J.M. van den Maagdenberg 3,4 & Marcel J.T. Reinders 1 . DBL, Dept. of Intelligent Systems, Delft University of Technology, The Netherlands 1 ; LKEB, Dept. of Radiology, Leiden University Medical Center, The Netherlands 2 ; Dept. of Human Genetics, Leiden University Medical Center, The Netherlands 3 ; Dept. of Neurology, Leiden University Medical Center, The Netherlands 4 . * s.m.h.huisman@tudelft.nl Migraine is a common brain disorder, with a heritability of around 50%. To understand the genetic component of this disease, a large genome wide association study has been carried out. Several loci were identified, but their interpretation remained challenging. We integrated the GWAS results with gene expression data, from healthy human brains, to identify anatomical regions and biological pathways implicated in migraine pathophysiology. INTRODUCTION Genome Wide Association Studies (GWAS) are frequently used to find common variants with small effect sizes. However, they often provide researchers with short lists of single nucleotide polymorphisms (SNPs) with uncertain connections to biological functions. We present an analysis of GWAS data for migraine, where the full list of SNP statistics is used to find groups of functionally related migraine-associated genes. For this end we make use of gene co-expression in the healthy human brain. We performed genome wide clustering of genes, followed by enrichment analysis for migraine candidate genes. In addition, we constructed local co-expression networks around high-confidence genes. Both approaches converge on distinct biological functions and brain regions of interest. METHODS Migraine GWAS data was obtained from the International Headache Genetics Consortium, with 23,285 cases and 95,425 controls (Anttila et al., 2013). Genes were scored by SNP load and divided into high-confidence genes, migraine candidate genes, and non-migraine genes. Spatial gene expression data in the healthy adult human brain was obtained from the Allen Brain Institute (Hawrylycz et al., 2012). It contains microarray expression values of 3702 samples from 6 donors. Robust gene co-expressions were used to cluster genes into 18 modules, which were then tested for enrichment of migraine candidate genes, and functionally characterized. In a second approach, local co-expression networks were built around the high-confidence migraine genes. These local networks were then compared to the modules of the first approach. RESULTS & DISCUSSION The genome wide analysis revealed several modules of genes enriched in migraine candidates. Two modules have preferential expression in the cerebral cortex and are enriched in synapse related annotations and neuron specific genes. A third module contains oligodendrocytes and genes preferentially expressed in subcortical regions. The local co-expression networks, of the second approach, converge on the same pathways and expression patterns, even though the high confidence genes lie mostly outside of the modules of interest. This provides a control to the results of the first approach. FIGURE 1. The co-expression network around high confidence migraine genes of the second approach. Genes (and links between them) of the migraine modules of the first approach are coloured in red, yellow, blue, and green. The analyses confirm the previously observed link between migraine and cortical neurotransmission. They also point to the involvement of subcortical myelination, which is in line with recent tentative findings. These results show that more relevant information can be extracted from GWAS results, using (publicly available) tissue specific expression patterns. REFERENCES Anttila V. et al. Genome-wide meta-analysis identifies new susceptibility loci for migraine. Nat. Genet. 45, 912–7, (2013). Hawrylycz M.J. et al. An anatomically comprehensive atlas of the adult human brain transcriptome. Nature 489, 391–9, (2012). 37
Page 1 and 2: 10 th Benelux Bioinformatics Confer
Page 3 and 4: 10th Benelux Bioinformatics Confere
Page 19 and 20: BeNeLux Bioinformatics Conference -
Page 35: BeNeLux Bioinformatics Conference -
Page 87 and 88:
BeNeLux Bioinformatics Conference -
Page 89 and 90:
Page 91 and 92:
Page 93 and 94:
Page 95 and 96:
Page 97 and 98:
Page 99 and 100:
Page 101 and 102:
Page 103 and 104:
Page 105 and 106:
Page 107 and 108:
Page 109 and 110:
Page 111 and 112:
Page 113 and 114:
Page 115:
10th Benelux Bioinformatics Confere
show all

bbc 2015

Create successful ePaper yourself

Delete template?

Save as template?