bbc 2015

Recommendations

Info

BeNeLux Bioinformatics Conference – Antwerp, December 7-8 2015 Abstract ID: P Poster 10th Benelux Bioinformatics Conference bbc 2015 P4. DISEASE-SPECIFIC NETWORK CONSTRUCTION BY SEED-AND-EXTEND Ganna Androsova 1* , Reinhard Schneider 1 & Roland Krause 1 . Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belval, Luxembourg 1 . * ganna.androsova@uni.lu INTRODUCTION Molecular interaction networks are dense structures of protein interactions, from which we would like to extract relevant sub-networks specific to the disease of interest. Such a disease-specific network is often constructed by the seed-and-extend algorithm, which extracts the relevant genes from an organism-wide, weighted interaction network, typically as its first-neighbourhood. Seed-andextend is suitable when disease biomarkers are poorly investigated and the knowledge about biomarker interaction partners is missing or when the interacting partners are established but the connections are missing between them. Our syndrome of interest is the postoperative cognitive impairment frequently experienced by elderly patients, characterized by progressive cognitive and sensory decline. The acute phase of cognitive impairment is postoperative delirium (POD). The underlying pathophysiological mechanisms have not been studied in depth due to mulitifactorial pathogenesis of this postoperative cognitive impairment. The known POD-related genes can be integrated into the draft network for exploration on a systems level. Here, we investigate how stable the results of such analysis are when the input set of seed genes is varied, and what is the role of stringency in the initial selection of the networks. Ideally, we would like to find the “sweet spot” that provides a biologically meaningful trade-off between false-positives and -negatives to be used for such analyses. METHODS The list of disease-related genes/proteins was retrieved from literature studies in the PubMed database. We extended the seed list with directly linked interactors by seed-and-extend from protein-protein interaction network databases. We extracted all interactions between seeds and connected neighbours, which resulted in the first-degree network. Next, we evaluated a biological enrichment of the extracted network, its topological parameters, overlap with other diseases and clustered the network into the smaller sub-networks. RESULTS & DISCUSSION The POD network (Figure 1) follows a free-scale distribution and consists of 541 proteins with 5,242 interactions between them. FIGURE 1. Postoperative delirium molecular network. The network was evaluated topologically by degree assortativity, density, shortest path, eccentricity and other measures. Pathways enrichment analysis showed glucocorticoid receptor signalling, immune response, and dopamine signalling as relevant to POD (Figure 2). FIGURE 2. Postoperative delirium pathway enrichment analysis. Top 5 hub proteins included UBC_HUMAN, GCR_HUMAN, P53_HUMAN, HS90A_HUMAN and EGFR_HUMAN. Appearance of p53 and other very frequent genes among top 5 hubs in our but also several other studies, motivated us to investigate its relevance to the disease and question the possible data bias. We compare how size, specificity and completeness of the input seed list can affect the resulting network and retrieval of the other disease-related proteins. 48
BeNeLux Bioinformatics Conference – Antwerp, December 7-8 2015 Abstract ID: P Poster 10th Benelux Bioinformatics Conference bbc 2015 P5. BIG DATA SOLUTIONS FOR VARIANT DISCOVERY FROM LOW COVERAGE SEQUENCING DATA, BY INTEGRATION OF HADOOP, HBASE AND HIVE Amin Ardeshirdavani 1* , Erika Souche 2 , Martijn Oldenhof 3 & Yves Moreau 1 . KU Leuven ESAT-STADIUS Center for Dynamical Systems, Signal Processing and Data Analytic 1; KU Leuven Department of Human Genetics 2; KU Leuven Facilities for Research 3. *amin.ardeshirdavani@esat.kuleuven.be Next Generation Sequencing (NGS) technologies allow the sequencing of the whole human genome to, among others, efficiently study human genetic disorders. However, the sequencing data flood needs high computation power and optimized programming structure to tackle data analysis. A lot of researchers use scale-out network to simulate supercomputer. In many use cases Apache Hadoop and HBase have been used to coordinate distributed computation and act as a storage platform, respectively. However, scale-out network has rarely been used to handle gene variation data from NGS, except for sequencing reads assembly. In our study, we propose a Big Data solution by integrating Apache Hadoop, HBase and Hive to efficiently analyze NGS output such as VCF files. INTRODUCTION The goal of this project is trying to overcome the difficulties between massive NGS data and low data process ability. We want propose a data process and storage model specifically for NGS data. To address our goal we develop an application based on this model to test whether its process ability is highly increased. The target users of this application are researchers with intermediatelevel computer skills. The new model should meet certain demands, which are scalable, high tolerant and availability. Data import procedure should be fast and occupies the smallest storage volume. It also needs to make querying data faster and possible from remote place. In order to achieve these demands, three open source projects: Apache Hadoop, HBase and Hive are integrated as the backbone and on top of them a user-friendly interface designed application is developed to make this integration more straightforward. METHODS Generally, Hadoop is for utilizing distributed MapReduce data processing, HBase is the platform for complex structured data storage and Hive is for data retrieve from HBase using of Structural Query Language (SQL) syntax. Though Hadoop and HBase are popular recently, the combination of Hadoop, HBase and Hive is rare to be implemented in bioinformatics field. Here we mainly discuss gene variation data analysis. Thus the application developing is focusing on parsing and storing VCF (Variant Call Format) file. The application is designed to dynamically adapt VCF file structures with respect to variant callers. For example in UnifiedGenotyper calls SNPs and InDels separately by considering each variant is independent, yet the other caller HaplotypeCaller calls variants by using local assembly. For gene variation analysis, the VCF files of different samples need to be queried and the results should be able to export for further usage. Normally a VCF file for each sample or a group of samples is considerably large, so the efficiency of processing is for sure very crucial. The model we have decided is the integration of Hadoop, HBase and Hive; Hadoop will be used for data processing, HBase for storage and Hive for querying. Since all of these projects need distributed cluster to optimize the performance, it is crucial to decide the suitable architecture for our application. The cluster will be the major processing and storage platform. The single server outside the cluster will act as a client for users. Our application can connect remotely to the Hive server for researchers. RESULTS & DISCUSSION The tests show clearly that the Apache integration performances much better than SQL model when dealing with large size VCF files. Also, for small VCF files, the integration performance is acceptable. So we conclude that Apache integration could be a good solution for this kind of file management. Our newly developed application H3 VCF with user-friendly interface is a nice tool for users without high level IT knowledge so they can conveniently use the integration to tackle VCF files. User can either choose to build his/ her own local computer cluster or use Amazon EMR to easily create a cluster with Apache projects for a few dollars. 49
Page 1 and 2: 10 th Benelux Bioinformatics Confer
Page 3 and 4: 10th Benelux Bioinformatics Confere
Page 19 and 20: BeNeLux Bioinformatics Conference -
Page 47: BeNeLux Bioinformatics Conference -
Page 99 and 100:
BeNeLux Bioinformatics Conference -
Page 101 and 102:
Page 103 and 104:
Page 105 and 106:
Page 107 and 108:
Page 109 and 110:
Page 111 and 112:
Page 113 and 114:
Page 115:
10th Benelux Bioinformatics Confere
show all

bbc 2015

Create successful ePaper yourself

Delete template?

Save as template?