bbc 2015

Recommendations

Info

BeNeLux Bioinformatics Conference – Antwerp, December 7-8 2015 Abstract ID: P Poster 10th Benelux Bioinformatics Conference bbc 2015 P32. ON THE LZ DISTANCE FOR DEREPLICATING REDUNDANT PROKARYOTIC GENOMES Raphaël R. Léonard 1,2* , Damien Sirjacobs², Eric Sauvage 1 , Frédéric Kerff 1 & Denis Baurain². Centre for Protein Engineering, University of Liège 1 ; PhytoSYSTEMS, University of Liège 2 . * rleonard@doct.ulg.ac.be The fast-growing number of available prokaryotic genomes, along with their uneven taxonomic distribution, is a problem when trying to assemble broadly sampled genome sets for phylogenomics and comparative genomics. Indeed, most of the new genomes belong to the same subset of hyper-sampled phyla, such as Proteobacteria and Firmicutes, or even to single species, such as Escherichia coli (almost 2000 genomes as of Sept 2015), while the continuous flow of newly discovered phyla prompts for regular updates. This situation makes it difficult to maintain sets of representative genomes combining lesser known phyla, for which only few species are available, and sound subsets of highly abundant phyla. An automated straightforward method is required but none are publicly available. The LZ distance, in conjunction with the quality of the annotations, can be used to create an automated approach for selecting a subset of representative genomes without redundancy. We are planning to release this tool on a website that will be made publicly available. INTRODUCTION The LZ distance (Lempel and Ziv, 1977; Otu and Sayood, 2003) is inspired by compression algorithms, such as gzip or WinRAR. This distance, amongst others, has already been used in attempts to produce alignment-free phylogenetic trees (Bacha and Baurain, 2005; Hohl et al. 2007), though the results were disappointing in such a context (due to the heterogeneity of the substitution process at large evolutionary scales). However, the LZ distance is likely to provide enough resolving power to identify groups of redundant genomes and to keep only one representative for each group. METHODS For each pair of genomes A and B, the LZ distance is computed from the gzip-compressed file lengths of the corresponding nucleotide assemblies s(A) and s(B) and of their concatenations s(A+B) and s(B+A). These distances, along with taxonomic information, are stored in a database. A clustering method is then applied to regroup the similar genomes into a user-specified number of groups. For each of these groups, a representative is chosen based on the quality of the genomic assemblies (chromosomes rather than scaffolds) and of the protein annotations (e.g., few rather than many “unknown proteins”). RESULTS & DISCUSSION Our method using the LZ distance is currently under development using the genomes from the release 28 of Ensembl Bacteria (ftp://ftp.ensemblgenomes.org/pub/ bacteria/release-28/). It contains 20,950 unique prokaryotic genomes, composed of 286 Archaea and 20,664 Bacteria. The three most represented phyla are the Proteobacteria (8642, of which 1980 E. coli), the Firmicutes (7766) and the Actinobacteria (2673). These genomes are already the result of a pre-processing step designed to remove extra assemblies for strains present in multiple copies (due to parallel sequencing or resequencing in different labs). We are working on different approaches for validating our dereplication method, based on (1) current taxonomy, (2) 16S rRNA phylogeny, and (3) clustering using genomic signatures (Moreno-Hagelsieb et al. 2013). First, we compute a central measure of the taxonomic “purity” of all genome clusters, which reflects the amount of “mixture” at different taxonomic levels (phylum, class, order etc). A good clustering should regroup different genera (or species) without amalgamating distinct classes (or phyla). Second, we cut the branches of a large 16S rRNA tree based on the same genome collection to produce an equal number of groups to compare with our clustering method. We then compute a statistic of the overlap between the 16S subtrees and the LZ clusters. A good clustering should have a reasonable overlap with the gold standard that is the 16S rRNA tree. Third, using the same overlap metric, we compare the LZ clusters to clusters obtained using the genomic signature. Finally, an interactive tool will be made available through a website. It will allow the users to download precomputed sets of representative genomes for either the complete database or for taxonomic subsets. We are also planning to allow users to upload their own genomes to cluster them with the LZ method. REFERENCES Ziv, J. and a. Lempel. 1977. ‘A Universal Algorithm for Sequential Data Compression.’ IEEE Transactions on Information Theory 23.3. doi:10.1109/TIT.1977.1055714. Otu, H. H. and K. Sayood. 2003. ‘A New Sequence Distance Measure for Phylogenetic Tree Construction.’ Bioinformatics 19.16: 2122–2130. doi:10.1093/bioinformatics/btg295. Moreno-Hagelsieb, G., Z. Wang, S. Walsh and A. Elsherbiny. 2013. ‘Phylogenomic Clustering for Selecting Non-Redundant Genomes for Comparative Genomics.’ Bioinformatics 29.1: 947–949. doi:10.1093/bioinformatics/btt064. Höhl, M. and M. a Ragan. 2007. ‘Is Multiple-Sequence Alignment Required for Accurate Inference of Phylogeny?’ Systematic biology 56.2: 206–221. doi:10.1080/10635150701294741. Bacha, S. and Baurain, D. 2005. ‘Application of Lempel-Ziv complexity to alignment-free sequence comparison of protein families’. Benelux Bioinformatics Conference 2005. http://hdl.handle.net/2268/80179 76
BeNeLux Bioinformatics Conference – Antwerp, December 7-8 2015 Abstract ID: P Poster 10th Benelux Bioinformatics Conference bbc 2015 P33. THE ROLE OF MIRNAS IN ALZHEIMER’S DISEASE Ashley Lu 1,2* , Annerieke Sierksma 1,2 , Bart De Strooper 1,2 & Mark Fiers 1,2 . VIB Center for the Biology of Disease 1 ; KU Leuven Center for Human Genetics 2 . * ashley.lu@cme.vib-kuleuven.be MicroRNAs (miRNA) play an important role in post-transcriptional regulation and were shown to be dysregulated in Alzheimer’s disease. By analysing the hippocampal miRNA and mRNA expression of two mouse models of Alzheimer’s disease, we identify a set of miRNAs that are dysregulated with the onset of cognitive impairments. Using GO enrichment analysis we aim to identify miRNAs that likely play a role in learning and memory. INTRODUCTION MiRNAs are small non-coding RNAs involved in posttranscriptional regulation through mRNA inhibition or degradation. Past studies have suggested miRNAs to play a direct role in Alzheimer’s disease (AD), e.g. by modulating the expression of genes involved in the formation of neuropathological protein aggregates (Lau P & De Strooper B, 2010). In this study, we investigated the changes in miRNA and mRNA expression in two AD mouse models: APPswe/PS1 L166P (Radde R, 2006) and Thy-Tau22 (Schindowski K, 2006), which have similar patterns of cognitive impairment, but different pathology. We aim to better understand the functional role of miRNAs in AD-related cognitive impairments. METHODS RNA was extracted from the left hippocampus of 96 mice. The experiment covers the two models (APPswe/PS1 L166P & Thy-Tau22), with wild type controls for each. All genotypes are tested at two ages (4 and 10 months); before and after onset of cognitive impairment. This yields eight experimental groups with twelve mice each. Expression profiles of miRNAs and mRNAs were generated using Illumina single-end sequencing. Differential Expression (DE) analysis was performed using the limma package of R/Bioconductor with a linear model to test the effects of age, genotype and their interaction. Functional analysis of the mRNAs and miRNAs are conducted separately. For mRNAs, gene ontology analysis was applied to sets of the most up- and down regulated genes. To determine the functional impact of dysregulated miRNAs we determined which mRNAs are the most likely direct targets of each miRNA using the following approach: 1) for each miRNA we calculated the Pearson’s correlation coefficient to each mRNA based on the miRNA and mRNA expression data. 2) For each miRNA we extracted the predicted set of targets from Targetscan (Lewis BP & Burge CB & Bartel DP, 2005), with Diana (Maragkakis M et al. 2011) as backup when Targetscan had no record. 3) We filtered the miRNA target genes by determining the leading edge set in a GSEA PreRanked analysis (Subramanian A. et al, 2005) using the predicted target mRNAs of each miRNA against the mRNAs ranked according to the Pearson’s scores generated in step 1. We additionally investigated target sets based on a Pearson’s correlation coefficient cut-off of -0.2, -0.3, and -0.4. 4) Gene-ontology analysis was then applied to these candidate target sets to infer the likely biological function of each miRNA. RESULTS & DISCUSSION DE analysis showed that the direction of expression level changes in mRNAs are similar between APPswe/PS1 166P and Thy-Tau22 in terms of age*genotype interaction effects. However, for the miRNAs the expression pattern is less obvious. Overall, the effect size is more pronounced in APPswe/PS1 L166P mouse than the Thy-Tau22 for both miRNAs and mRNAs. Functional analyses of the down-regulated mRNAs show a clear enrichment in cognition and neural development related categories, whereas up-regulated genes show a clear inflammatory signature. Combining miRNA target prediction with miRNA/mRNA correlation analysis shows a marked increase of GO enrichment scores. This analysis strongly suggests a regulatory role for miRNAs in the down regulation of genes involved in learning, cognition and related categories. This analysis workflow has allowed focusing on a list of miRNAs that likely play a direct role in the observed learning and memory deficits in AD mouse models, and have been used to select candidate miRNAs for downstream in vivo experiments, which will hopefully provide a deeper understanding in the impact of AD on learning and cognition. REFERENCES Lau P & De Strooper B. Seminars in Cell & Developmental Biology, 21(7), 768–773, (2010). Radde R. EMBO reports, 7(9), 940–946, (2006). Schindowski K. The American Journal of Pathology, 169(2),599–616, (2006). Lewis BP & Burge CB & Bartel DP. Cell, 120,15-20 (2005). Maragkakis M et al. Nucleic Acids Research (2011) Subramanian A. et al. Proceedings of the National Academy of Sciences of the United States of America, 102(43), 15545–15550, (2005) 77
Page 1 and 2:
10 th Benelux Bioinformatics Confer
Page 3 and 4:
10th Benelux Bioinformatics Confere
Page 5 and 6:
Page 7 and 8:
Page 9 and 10:
Page 11 and 12:
Page 13 and 14:
Page 15 and 16:
Page 17 and 18:
Page 19 and 20:
BeNeLux Bioinformatics Conference -
Page 21 and 22:
Page 23 and 24:
Page 25 and 26: BeNeLux Bioinformatics Conference -
Page 75: BeNeLux Bioinformatics Conference -
Page 115: 10th Benelux Bioinformatics Confere
show all

bbc 2015

Create successful ePaper yourself

Delete template?

Save as template?