bbc 2015

Recommendations

Info

BeNeLux Bioinformatics Conference – Antwerp, December 7-8 2015 Abstract ID: O22 Oral presentation 10th Benelux Bioinformatics Conference bbc 2015 O22. PEPSHELL: VISUALIZATION OF CONFORMATIONAL PROTEOMICS DATA Elien Vandermarliere 1,2* , Davy Maddelein 1,2 , Niels Hulstaert 1,2 , Elisabeth Stes 1,2 , Michela Di Michele 1,2 , Kris Gevaert 1,2 , Edgar Jacoby 3 , Dirk Brehmer 3 & Lennart Martens 1,2 . Department of Medical Protein Research, VIB 1 ; Department of Biochemistry, Ghent University 2 ; Oncology Discovery, Janssen Research and Development – Janssen Pharmaceutica, Beerse 3 . * elien.vandermarliere@ugent.be Proteins are dynamic molecules; they undergo crucial conformational changes induced by post-translational modifications and by binding of cofactors or other molecules. The characterization of these conformational changes and their relation to protein function is a central goal of structural biology. Unfortunately, most conventional methods to obtain structural information do not provide information on protein dynamics. Therefore, mass spectrometry-based approaches, such as limited proteolysis, hydrogen-deuterium exchange, and stable-isotope labelling, are frequently used to characterize protein conformation and dynamics, yet the interpretation of these data can be cumbersome and time consuming. Here, we present PepShell, a tool that allows interactive data analysis of mass spectrometry-based conformational proteomics studies by visualization of the identified peptides both at the sequence and structure levels. Moreover, PepShell allows the comparison of experiments under different conditions which include proteolysis times or binding of the protein to different substrates or inhibitors. INTRODUCTION The study of protein structure with mass spectrometry, called conformational proteomics, is frequently used to characterize protein conformations and dynamics. Most of these methods exploit the surface accessibility of amino acids within the native protein conformation or more specifically, the differences in protein surface accessibility in different situations within a protein structure. The experimental setup and subsequent workflow of a conformational proteomics experiment do not deviate drastically from that of a classic mass spectrometry-based experiment in which peptides present in a complex peptide mixture are identified. The final outcome of a conformational proteomics experiment is a list of peptides. These peptide lists typically span multiple experimental conditions across which the structural observations are to be compared; the peptide lists have to be combined and, if available, mapped onto the structure of the protein. To fulfill these latter steps, we developed PepShell (Vandermarliere et al., 2015), to guide the interpretation of mass spectrometry-based proteomics data in the context of protein structure and dynamics. TOOL DESCRIPTION PepShell aids the user in the interpretation of the outcome of conformational proteomics experiments and is composed of three panels: the experiment comparison panel, the PDB view panel, and the statistics panel. The data to analyze PepShell allows the input from limited proteolysis, hydrogen-deuterium exchange, MS footprinting and stable-isotope labelling experiments. The data have to be present in a comma-separated text file format. The project selection interface allows the user to select a reference project and to indicate which setups need to be compared with each other. Experiment comparison This panel allows the comparison of the selected experimental setups at the sequence level. For each experimental condition, the identified and quantified peptides are mapped onto the sequence of the protein of interest. The PDB view panel Here, the detected peptides are mapped on the protein structure. The main requirement is the availability of a 3D structure of the protein of interest. Statistics within PepShell In this panel, the peptides of interest can be analyzed in more detail. The outcome from CP-DT (Fannes et al., 2013) for tryptic cleavage probability for each tryptic cleavage position is given. Also detailed comparison of the peptide ratios over the different experimental setups is allowed. CONCLUSIONS The increasing popularity of structural proteomics is in stark contrast with the availability of efficient tools to visualize this multitude of data. There are however some tools available that aid data interpretation; but these are approach-specific and are aimed primarily at mass spectrometrists with a specific focus on the experimental mass spectrometry data and their processing and interpretation. PepShell on the other hand is intended to support downstream users to interpret the results obtained from a variety of conformational proteomics approaches. PepShell uses the peptide lists to compare different experimental conditions and allows the visualization of these differences onto the structure of the protein. As such, PepShell bridges the gap between mass spectrometrybased proteomics data and their interpretation in the context of protein structure and dynamics. PepShell is an open source Java application. Its binaries, source code and documentation can be found at: compomics.github.io/projects/pepshell.html REFERENCES Fannes T et al. J Proteome Res 12, 2253-2259 (2013). Vandermarliere E et al. J Proteome Res 14, 1987-1990 (2015). 42
BeNeLux Bioinformatics Conference – Antwerp, December 7-8 2015 Abstract ID: O23 Oral presentation 10th Benelux Bioinformatics Conference bbc 2015 O23. INTERACTIVE VCF COMPARISON USING SPARK NOTEBOOK Thomas Moerman 1,2,5* , Dries Decap 3,5 , Toni Verbeiren 2,5 , Jan Fostier 3,5 , Joke Reumers 4,5 , Jan Aerts 2,5 . Advanced Database Research and Modeling (ADReM), University of Antwerp 1 ; Visual Data Analysis Lab, ESAT – STADIUS, Dept. of Electrical Engineering, KU Leuven – iMinds Medical IT 2 ; Department of Information Technology, Ghent University – iMinds, Gaston Crommenlaan 8 bus 201, 9050 Ghent, Belgium 3 ; Janssen Research & Development, a division of Janssen Pharmaceutica N.V., 2340 Beerse, Belgium 4 ; ExaScience Life Lab, Kapeldreef 75, 3001 Leuven, Belgium 5 . * thomas.moerman@esat.kuleuven.be Researchers benefit greatly from tools that allow hands-on, interactive and visual experimentation with data, unimpeded by setup complexities nor scaling issues resulting from large data sizes. In our contribution we present an implementation of an interactive VCF comparison tool, making use of a technology stack based on Apache Spark [1], Big Data Genomics Adam [2] and Spark Notebook [3]. INTRODUCTION Current genomics data formats and processing pipelines are not designed to scale well to large datasets [1]. They were also not conceived to be used in an interactive environment. The bioinformatics field typically struggles with these difficulties as high-throughput, next-generation sequencing jobs produce large data files. Although many high-quality bioinformatics processing tools exist, it is often hard to express analyses in a consolidated and reproducible fashion. These tools typically do not allow to interactively iterate on an analysis while visualizing results. OBJECTIVE Analysis tools preferably provide the expressive power to define ad hoc queries on data. Biologists or clinical researchers, when dealing with genomic variants encoded in VCF files, typically perform queries comparing one protocol to another, tumor to normal, treated to untreated cell lines and so on. Ideally these comparisons make use of all quality-related metrics stored in VCF files (e.g. coverage depth, quality score) as well as the actual region annotations (e.g. repeat regions, exonic regions) and generate visual output. We aim to implement a tool that provides the necessary expressiveness as well as the computational power needed for making these types of analyses practical and interactive. APPROACH Recent advances in computation platform technology (Spark) and notebook technologies (Spark Notebook) enable orchestration of distributed jobs on cluster infrastructure from a programmable environment running in a browser. These technologies, combined with Adam [2], a library specifically designed for processing nextgeneration sequencing data, provide the necessary architectural bedrock for our purposes. Analyses are expressed in a high-level programming language (Scala), operating on specialized data structures (Spark resilient distributed datasets, or RDDs [1]) that make abstraction of the complexity of defining distributed computations on data sets too large for single node processing. Adam meets the need for an explicit data schema for abstraction of the different bioinformatics file formats. RESULTS & CONTRIBUTIONS Our work focuses on the pairwise comparison of annotated VCF files. Our contributions consist of two open-source Scala libraries: VCF-comp [4] and Adam-FX [5]. VCFcomp implements the concordance by variant position algorithm, which segregates the variants from two VCF inputs (A, B) into 5 categories: A/B-unique, concordant (equal variants on position) and A/B-discordant (different variants on position). This results in a distributed data structure from which we project visualizations, presented to the user by means of the Spark Notebook interface. FIGURE 1 Allele frequency distribution for concordant and unique variants in a tumor vs. normal VCF comparison. FIGURE 2 Functional impact (SnpEff annotation) histogram for concordant, unique and discordant variants in a tumor vs. normal VCF comparison. Adam-FX extends the Adam data structures and file parsing logic in order to support queries on SnpEff [6], SnpSift [7], dbSNP and Clinvar annotations. We believe our tool facilitates the comparison of annotated VCF files in an interactive manner while reducing runtime by leveraging the Spark platform. REFERENCES [1] Zaharia, Matei, et al. "Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing." [2] Massie, Matt, et al. "Adam: Genomics formats and processing patterns for cloud scale computing." [3] https://github.com/andypetrella/spark-notebook [4] https://github.com/tmoerman/vcf-comp [5] https://github.com/tmoerman/adam-fx [6] Cingolani, P, et al. "A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3.", Fly (Austin). 2012 Apr-Jun;6(2):80-92. PMID: 22728672 43
Page 1 and 2: 10 th Benelux Bioinformatics Confer
Page 3 and 4: 10th Benelux Bioinformatics Confere
Page 19 and 20: BeNeLux Bioinformatics Conference -
Page 41: BeNeLux Bioinformatics Conference -
Page 93 and 94:
BeNeLux Bioinformatics Conference -
Page 95 and 96:
Page 97 and 98:
Page 99 and 100:
Page 101 and 102:
Page 103 and 104:
Page 105 and 106:
Page 107 and 108:
Page 109 and 110:
Page 111 and 112:
Page 113 and 114:
Page 115:
10th Benelux Bioinformatics Confere
show all

bbc 2015

Create successful ePaper yourself

Delete template?

Save as template?