bbc 2015

Recommendations

Info

BeNeLux Bioinformatics Conference – Antwerp, December 7-8 2015 Abstract ID: P Poster 10th Benelux Bioinformatics Conference bbc 2015 P38. MINING ACROSS “OMICS” DATA FOR DRUG PRIORITIZATION Stefan Naulaerts 1,2* , Pieter Meysman 1,2 , Bart Goethals 1 , Wim Vanden Berghe ,3 & Kris Laukens 1,2 . Advanced Database Research and Modeling (ADReM), University of Antwerp 1 ; Biomedical informatics research center Antwerpen (biomina) 2 ; Department for Biomedical Sciences, University of Antwerp 3 . * stefan.naulaerts@uantwerpen.be Drug resistance and response have traditionally been investigated by means of case-by-case studies. The process to profile drug compounds is time and resource intensive. Large scale information on gene expression and protein abundance, protein interactions, as well as functional and pathways annotations exist nowadays, as well as freely accessible repositories for drug targets. Also structural evidence of select drug compounds is publicly available. These data offer an enormous opportunity for data integration and pattern mining efforts across each of these levels. Here, we apply frequent itemset mining to identify structurally similar compounds, and to detect patterns within the biological effect profiles of these chemical compound families. Next, we explore how we can link both types of patterns to metainformation (such as drug interactions) in a bid to identify promising compounds and speed up the drug discovery process by means of candidate prioritization. INTRODUCTION In the last decades, several widely used databases have emerged. These vary from gene expression data and massspectrometric protein identifications to resources covering interaction graphs or functional annotations of proteins and chemicals. The presence of these resources offers interesting opportunities to gain deeper insight in drug mode of action, as well as help reduce important bottlenecks with regards to the speed of novel drug discovery or drug repurposing, by intelligently prioritizing potentially interesting compounds. METHODS To integrate the listed kinds of data, we use pattern mining methods that are collectively known as “frequent itemset mining”. This set of techniques uses clever heuristics to efficiently find items that occur more often together than a minimal threshold. In this work, we identified several pattern types based on their source: Expression itemsets Metadata itemsets Graph patterns (protein-protein, protein-drug and chemical structures) For subgraph mining, we used GASTON 1 . All other data sources were analysed with Apriori 2 . To deal with the extreme numbers of patterns that result from mining this kind of data, we used a filter which incorporates several quality measures based on objective data mining measures properties (e.g. lift), as well as more biologically inspired methods (e.g. functional coherence in the Gene Ontology 3 tree). Simple classification based on the patterns was performed with CBA 4 . RESULTS & DISCUSSION We were able to identify several backbone patterns within the chemical structures studied and used these to define “chemical compound families”. Next, we used this classification as starting point to group experimental evidence (bio-assays, interactions and metadata). After applying cut-offs based on the quality measures, all patterns remaining were significant and made sense biologically. Unsurprisingly, structurally similar compound families show significant pattern overlaps in drug-drug interactions, gene expression, term co-occurrence and conserved protein-protein interactions. We found that specific patterns in the biological profile often correlate with specific discriminative structural patterns. Moreover, these collections of structural frequent subgraphs seemed highly relevant for the mode in which a compound connects to the “core” proteome. This central proteome performs essential functions of the cell (e.g. energy metabolism) and it is known to be conserved across cell types. Structurally distinct compound families converge much later (if at all) to the same “core proteins” than more similar chemicals do. This observation corresponds to currently known pathway knowledge and tissue biology. We were further able to associate previously unseen compounds to chemicals present in the database, based on the subgraph collection and by extension to the biological profile patterns. Manual survey of literature indicated that several compounds not covered by our database have recently been approved or are in testing as alternative drugs to the compounds we hypothesized as being substantially similar. FIGURE 1. Visualizing the dexamethasone environment. Both predictions and experimental evidence (drug-target and protein-protein interactions) are shown. REFERENCES 1. Nijssen S & Kok J. ENTCS 127, 77-87 (2005). 2. Agrawal R & Srikant R. Proc 20th Int Conf on Very Large Databases (1994). 3. Ashburner M et al. Nat Genet 25, 25-29 (2000). 4. Liu B et al. KDD (1998). 82
BeNeLux Bioinformatics Conference – Antwerp, December 7-8 2015 Abstract ID: P Poster 10th Benelux Bioinformatics Conference bbc 2015 P39. ABUNDANT TRANS-SPECIFIC POLYMORPHISM AND A COMPLEX HISTORY OF NON-BIFURCATING SPECIATION IN THE GENUS ARABIDOPSIS Polina Novikova 1 , Nora Hohmann 2 , Marcus Koch 2 & Magnus Nordborg 1 . Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter (VBC), A-1030 Vienna, Austria 1 ; Centre for Organismal Studies Heidelberg, University of Heidelberg, D-69120 Heidelberg, Germany 2 . *magnus.nordborg@gmi.oeaw.ac.at The prevailing notion of species rests on the concept of reproductive isolation. Under this model, sister taxa should not share genetic variation unless they still hybridize, or diverged too recently for genetic drift to have eliminated shared ancestral polymorphism, and gene trees should generally agree with species trees. Advances in sequencing technology are finally making it possible to evaluate this model. We sequenced (Illumina 100bp paired reads) multiple individuals from 26 proposed taxa in the genus Arabidopsis. Cluster analysis identified seven distinct groups, corresponding to four common species — the model species A. thaliana, plus A. arenosa, A. halleri and A. lyrata — and three species with very limited geographical distribution. However, at the level of gene trees, only the separation of A. thaliana from the remaining taxa was universally supported, and even in this case there was abundant sharing of ancestral polymorphism with the other taxa, demonstrating that reproductive isolation must be fairly recent. By considering the distribution of derived alleles, we were also able to reject a bifurcating species tree because there is clear evidence for asymmetrical gene flow between taxa. Finally, we show that the pattern of sharing and divergence between taxa differs between gene ontologies, suggesting a role for selection. 83
Page 1 and 2:
10 th Benelux Bioinformatics Confer
Page 3 and 4:
10th Benelux Bioinformatics Confere
Page 5 and 6:
Page 7 and 8:
Page 9 and 10:
Page 11 and 12:
Page 13 and 14:
Page 15 and 16:
Page 17 and 18:
Page 19 and 20:
BeNeLux Bioinformatics Conference -
Page 21 and 22:
Page 23 and 24:
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32: BeNeLux Bioinformatics Conference -
Page 81: BeNeLux Bioinformatics Conference -
Page 115: 10th Benelux Bioinformatics Confere
show all

bbc 2015

Create successful ePaper yourself

Delete template?

Save as template?