bbc 2015

Recommendations

Info

BeNeLux Bioinformatics Conference – Antwerp, December 7-8 2015 Abstract ID: P Poster 10th Benelux Bioinformatics Conference bbc 2015 P52. SUPERVISED TEXT MINING FOR DISEASE AND GENE LINKS Jaak Simm 1,2,3* , Adam Arany 1,2 , Sarah ElShal 1,2 & Yves Moreau 1,2 . Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing, and Data Analytics, KU Leuven, Kasteelpark Arenberg 10, box 2446, 3001 Leuven, Belgium 1 ; iMinds Medical IT, Kasteelpark Arenberg 10, box 2446, 3001 Leuven, Belgium 2 ; Institute of Gene Technology, Tallinn University of Technology, Akadeemia tee 15A, Estonia 3 . * jaak.simm@esat.kuleuven.be Scientific publications contain rich information about genetic disorders. Text mining these publications provides an automatic way to quickly query and summarize the information. We propose a supervised learning approach that takes advantage of the well known unsupervised approach TF-IDF (term frequency–inverse document frequency) and integrates it with supervised approach using logistic loss error metric. The preliminary results on OMIM dataset look promising. INTRODUCTION Scientific publications contain rich information about genetic disorders. Text mining these publications provides an automatic way to quickly query and summarize the information. The traditional approaches employ unsupervised text mining approaches like TF-IDF (term frequency–inverse document frequency) or Latent Dirichlet Allocation (LDA) by Blei et al. (2003) for linking terms to genes and diseases. A recent text mining software Beegle (ElShal et al., 2015) developed for linking diseases and genes has taken this approach using TF-IDF as its similarity metric. PROPOSED METHOD Our work proposes a supervised learning of the importance of the textual terms, which can automatically filter out many terms that are unnecessary for the task at hand. We formulate it as a prediction of supervised values y given the terms for all genes g and all diseases d where i is the index of the term: and w i is the weight for the term i and σ is sigmoid function. The main idea is to learn the weight vector w that minimizes the difference between known values y and predictions. The minimization can transformed into a logistic regression. For the supervised values we use OMIM database (Hamosh et al., 2003). More specifically y corresponds to 1 if there is a link between the given gene-disease pair and 0 if there is no link. Intuitively, in this setup the text mining is transformed into a classification problem. We use dataset of 330 OMIM terms and their linked genes and randomly sample genes as negatives for each disease. For the textual terms we use MEDLINE abstracts as the source of biomedical text. We employ MetaMap (Aronson et al. 2010) to link terms with abstracts. We use geneRIF to link genes with abstracts, and PubMed to link diseases with abstracts. We apply a TF-IDF transformation to score a term with a given disease or gene based on the abstracts linked to each entity. We only use the terms linked to abstracts that belong to genes. Hence our vocabulary consists of 66,883 terms. RESULTS & DISCUSSION The preliminary results show that supervised learning allows to automatically pick up the keywords that are informative, improving the recall of the genes that are related to genetic disorders. We will present more detailed results in the poster. We are also investigate how to integrate the supervised approach to have answers to online queries provided by Beegle. REFERENCES Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. the Journal of machine Learning research, 3, 993-1022. Hamosh, A., Scott, A. F., Amberger, J. S., Bocchini, C. A., & McKusick, V. A. (2005). Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic acids research, 33(suppl 1), D514-D517. ElShal, S., Tranchevent L.C., Sifrim A., Ardeshirdavani A., Davis J., Moreau Y. (2015). Beegle: from literature mining to disease-gene discovery. Nucleic Acids Res, gkv905. Aronson, A. R., & Lang, F. M. (2010). An overview of MetaMap: historical perspective and recent advances. Journal of the American Medical Informatics Association, 17(3), 229-236. 96
BeNeLux Bioinformatics Conference – Antwerp, December 7-8 2015 Abstract ID: P Poster 10th Benelux Bioinformatics Conference bbc 2015 P53. FLOWSOM WEB: A SCALABLE ALGORITHM TO VISUALIZE AND COMPARE CYTOMETRY DATA IN THE BROWSER Arne Soete 2 , Sofie Van Gassen 1,2,3 , Tom Dhaene 1 , Bart N. Lambrecht 2,3 & Yvan Saeys 2,3 . Department of Information Technology, Ghent University-iMinds, Ghent, Belgium 1 ; Inflammation Research Center, VIB, Ghent, Belgium 2 ; Department of Respiratory Medicine, Ghent University Hospital, Ghent, Belgium 3 . We developed FlowSOM Web, a web-tool which visualizes cytometry data based on Self-Organizing Maps. Similar cells are clustered and visualized via star charts. This allows us to process and display millions of cells efficiently. Additionally, different biological samples (e.g. healthy versus diseased mice) can be compared. INTRODUCTION Cytometry data describes cell characteristics in biological samples. Cells are labeled with fluorescent antibodies and a flow cytometer measures the properties of millions of cells one by one. Biologists use this information to get more insight in diseases and to diagnose patients. Most of them still analyse this data manually to differentiate between the different cell types present. This is done by plotting the data in 2D scatter plots and selecting groups of cells in a hierarchical way. This process is called `gating'. Recently, the number of properties that can be measured simultaneously has strongly increased. As the number of possible 2D scatter plots increases exponentially with the number of properties measured, it becomes infeasible to analyze them all and relevant information that is present in the data might be missed. METHODS We present FlowSOM, a new algorithm for the visualization and interpretation of cytometry data (Van Gassen, et al,. 2015). Using a twolevel clustering and star charts, our algorithm helps to obtain a clear overview of how all markers are behaving on all cells, and to detect subsets that might be missed otherwise. Our algorithm consists of 4 steps: pre-processing the data, building a self-organizing map, building a minimal spanning tree and computing a meta-clustering result. RESULTS & DISCUSSION Although our results are quite similar to SPADE, another state-of-the art algorithm for the visualization of cytometry data, our results can be computed much faster and use less memory. By providing star-charts and an automatic meta-clustering step, much more information can be visualised in a single tree than is done by the SPADE algorithm. Additionally, multiple states can be compared (e.g. healthy versus diseased mice) with one another and the differences between the two states can be visualized via star-charts. On this conference, we would like to demonstrate a recently developed web interface to the underlying R functionality. This interface allows to upload cytometry data, run the aforementioned analysis, compare different cell states and explore the results, via interactive visualizations, all from the comfort of the browser. FIGURE 1. Example of a FlowSOM star chart. REFERENCES Van Gassen, et al. (2015), FlowSOM: Using self-organizing maps for visualization and interpretation of cytometry data. Cytometry, 87: 636–645 97
Page 1 and 2:
10 th Benelux Bioinformatics Confer
Page 3 and 4:
10th Benelux Bioinformatics Confere
Page 5 and 6:
Page 7 and 8:
Page 9 and 10:
Page 11 and 12:
Page 13 and 14:
Page 15 and 16:
Page 17 and 18:
Page 19 and 20:
BeNeLux Bioinformatics Conference -
Page 21 and 22:
Page 23 and 24:
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
Page 45 and 46: BeNeLux Bioinformatics Conference -
Page 95: BeNeLux Bioinformatics Conference -
Page 115: 10th Benelux Bioinformatics Confere
show all

bbc 2015

Create successful ePaper yourself

Delete template?

Save as template?