bbc 2015

Recommendations

Info

BeNeLux Bioinformatics Conference – Antwerp, December 7-8 2015 Abstract ID: P Poster 10th Benelux Bioinformatics Conference bbc 2015 P68. DEFINING THE MICROBIAL COMMUNITY OF DIFFERENT LACTOBACILLUS NICHES USING METAGENOMIC SEQUENCING Sander Wuyts 1,2* , Eline Oerlemans 1 , Ilke De Boeck 1 , Wenke Smets 1 , Dieter Vandenheuvel, Ingmar Claes 1 & Sarah Lebeer 1 . Laboratory of Applied Microbiology and Biotechnology, University of Antwerp 1 ; Research Group of Industrial Microbiology and Food Biotechnology (IMDO), Vrije Universiteit Brussel 2 * Sander.Wuyts@UAntwerp.be Next-Generation Sequencing (NGS) has revolutionized the field of microbial community analysis. Due to these highthroughput DNA-technologies, microbiologists are now able to perform more in-depth analyses of various microbial communities compared to culture-independent methods. In our lab, we have successfully deployed 16S rDNA amplicon sequencing using MiSeq-sequencing (Illumina). A bioinformatic pipeline has been built based on mothur (Schloss et al. 2009), UPARSE (Edgar 2013) and Phyloseq (McMurdie & Holmes 2013) to analyse different microbial community datasets. The focus is on functional analysis of lactobacilli and other lactic acid bacteria in different ecological niches: ranging from the human upper respiratory tract to naturally fermented plant-based foods. INTRODUCTION 16S metagenomics is a technique that makes use of the highly conserved bacterial 16S rRNA gene. This gene codes for an RNA-molecule which is a component of the 30S small subunit of bacterial ribosomes. It consists of 9 hypervariable regions, flanked by conserved regions for which primer pairs for PCR/sequencing can be designed. Due to these characteristics and due to the slow rate of evolution, this gene has been widely used in bacterial phylogeny and taxonomy. NGS technologies like Illumina MiSeq have made it possible to study all the different 16S rRNA gene copies from an environmental sample and use these to identify the bacteria present in the sample. But the use of these high-throughput technologies comes with a cost: the need for a more in-depth bioinformatic analysis. METHODS Wetlab: DNA is extracted using sample dependent extraction protocols. A barcoded PCR is performed on the V4 region of the 16S rRNA gene as described in Kozich et al. 2013. For each sample a different set of primers is used; each primerset contains a unique combination of barcodes. The PCR-products are cleaned using AMPure XP (Agencourt) bead purification and quantified using Qubit (Life technologies). All samples are equimolary pooled into one single library. A negative control (= “empty” DNAextraction) and a positive control (= “Mock” communities HM-276D and HM-782D) are always processed together with the samples. The library is sequenced using a dual index sequencing strategy (Kozich et al. 2013) and a 2 x 250 bp kit on the Illumina MiSeq. Bio-informatic analysis: Samples are demultiplexed on the MiSeq itself, allowing 1 bp difference in the barcodes. The general quality of the reads is checked using FastQC (Babraham Bioinformatics). The paired end reads are merged using mothur’s make.contigs command. Quality control in mothur is performed using screen.seqs, alignment to the SILVA database and removal of sequences that do not map to the database, removal of chimeras using chimera.uchime and removal of sequences that classify to the lineages “Mitochondria” and “Chloroplast”. The distance between sequences are calculated using mothur’s dist.seqs command and are clustered at 97 % sequence similarity using mothur’s cluster command. Alternatively the UPARSE clustering algorithm can be used for these last two steps. Sequences are classified using the RDP database and the complete dataset is exported as a .biom file. Visualisation and statistical analysis is performed using the R-package Phyloseq. This analysis depends on the experimental design but generally consists of a normalisation step (either using rarefying, proportions or a statistical mixture model (McMurdie & Holmes 2014)), a calculation of alpha diversity measurements and a calculation and visualisation of beta diversity. RESULTS & DISCUSSION The above described method was optimised and proved to be working. We successfully used this technique to obtain better insights in the role of lactobacilli in different ecological niches, e.g. in the murine gastrointestinal tract, vegetable fermentations and the human upper respiratory tract. REFERENCES Edgar, R.C., 2013. UPARSE: highly accurate OTU sequences from microbial amplicon reads. Nature methods, 10(10), pp.996–8. Kozich, J.J. et al., 2013. Development of a dual-index sequencing strategy and curation pipeline for analyzing amplicon sequence data on the MiSeq Illumina sequencing platform. Applied and environmental microbiology, 79(17), pp.5112–20. McMurdie, P.J. & Holmes, S., 2013. Phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data. PLoS ONE, 8(4). McMurdie, P.J. & Holmes, S., 2014. Waste not, want not: why rarefying microbiome data is inadmissible. PLoS computational biology, 10(4), p.e1003531. Schloss, P.D. et al., 2009. Introducing mothur: Open-source, platformindependent, community-supported software for describing and comparing microbial communities. Applied and Environmental Microbiology, 75(23), pp.7537–7541. 112
BeNeLux Bioinformatics Conference – Antwerp, December 7-8 2015 Abstract ID: P Poster 10th Benelux Bioinformatics Conference bbc 2015 P69. HUNTING HUMAN PHENOTYPE-ASSOCIATED GENES USING MATRIX FACTORIZATION Pooya Zakeri 1,2,* , Jaak Simm 1,2 , Adam Arany 1,2 , Sarah Elshal 1,2 & Yves Moreau 1,2 . Department of Electrical Engineering, STADIUS, KU Leuven, Leuven 3001, Belgium 1 ; iMinds Medical IT, Leuven 3001, Belgium 2 . * pooya.zakeri@esat.kuleuven.be In the last decade, the phenotype-genes identification has received growing attention. It is yet one of the most challenging problem in biology. In particular, determining disease-associated genes is a demanding process and plays a crucial role in understanding the relationship between phenotype disease and genes. Typical approaches for gene prioritization often models each diseases individually, that fails to capture the common patterns in the data. This motivates us to formulate the hunting phenotype-associated genes problem as a factorization of an incompletely filled gene-phenotype-matrix where the objective is to predict unknown values. Experimental result on the updated version of Endeavour benchmark demonstrates that our proposed model can effectively improve the accuracy of the state-of-the-art gene prioritization model. INTRODUCTION In biology, there is often the need to discover the most promising genes among large list of candidate genes to further investigate. While a single data source might not be effective enough, fusing several complementary genomic data sources results in more accurate prediction. Moreover, fusing the phenotypic similarity of diseases and sharing information about known disease genes across both diseases and genes through a multi-task approach, enable us to handle gene prioritization for diseases with very few known genes and genes with limited available information. Typical strategies for hunting phenotypeassociated genes often models each phenotype individually [1, 2, 3, 4], that fails to capture the common patterns in the data. This motivates us to formulate the hunting phenotype-associated genes task as a factorization of an incompletely filled gene-phenotype-matrix where the objective is to predict unknown values. METHODS We consider OMIM database which is a human phenotype disease specific association databases. OMIM focuses on the relationship between human genotype and associated diseases. OMIM database can be seen as an incomplete matrix where each row is a gene and each column is a phenotype (disease). The idea behind the factorizing the M×N OMIM matrix is to represent each row and each column by a latent vector of size D. Then, the OMIM matrix can be modeled by product of an N×D gene matrix G and an M× D disease matrix P. Bayesian matrix factorization (BPMF) [5] is a famous method to fill such an incomplete matrix. But BPMF uses no side information which results in an inaccurate genephenotype-matrix completion. We propose an extended version of BPMF with an ability to work with multiple side information sources for completing gene-phenotype-matrix [6], which allows to make out-of-genes-phenotype-matrix ranking. In our proposed framework we are also able to integrate both genomic data sources and phenotypes information, whereas earlier approaches for hunting phenotype associated genes are limited to only fuse genomic information. This modification is done by adding genomic and phenotypic features to the corresponding latent variables [6]. In this study, we consider several genomic data sources including annotation-based data sources such as UniProt annotation, literature-based data sources on each genes, and as well the literature-based phenotypic information on each diseases, as just as in [1, 4, 9]. The framework of our Bayesian data fusion model for gene prioritization is illustrated in Figure 1. FIGURE 1. The framework of our Bayesian data fusion model for gene prioritization. RESULTS & DISCUSSION We report the average TPR results, when considering the top 1%, 5%, 10%, and 30% of the ranked genes. Experimental result on the updated version of Endeavour [3] benchmark demonstrates that our proposed model can effectively improve the accuracy of the state-of-the-art gene prioritization model. REFERENCES Aerts, S. et al. Nat Biotech, 24(5), 537–544, (2006). De Bie T, Tranchevent LC, van Oeffelen LMM, Moreau Y, Bioinformatics, 23(13):i125-i132, (2007). Tranchevent LC1, et. al. NAR, (35) W377-W384(2008) . ElShal S, et al. Davis J. Moreau Y. NAR, (2015). R. Salakhutdinov and A. Mnih. 25th ICML, 880–887. ACM, (2008). SIMM J, et al. arXiv:1509.04610 [stat.ML], (2106). 113
Page 1 and 2:
10 th Benelux Bioinformatics Confer
Page 3 and 4:
10th Benelux Bioinformatics Confere
Page 5 and 6:
Page 7 and 8:
Page 9 and 10:
Page 11 and 12:
Page 13 and 14:
Page 15 and 16:
Page 17 and 18:
Page 19 and 20:
BeNeLux Bioinformatics Conference -
Page 21 and 22:
Page 23 and 24:
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
Page 53 and 54:
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
Page 61 and 62: BeNeLux Bioinformatics Conference -
Page 111: BeNeLux Bioinformatics Conference -
Page 115: 10th Benelux Bioinformatics Confere
show all

bbc 2015

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?