bbc 2015
BBC2015_booklet
BBC2015_booklet
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
BeNeLux Bioinformatics Conference – Antwerp, December 7-8 <strong>2015</strong><br />
Abstract ID: P<br />
Poster<br />
10th Benelux Bioinformatics Conference <strong>bbc</strong> <strong>2015</strong><br />
P37. ANALYSIS OF RELATIONSHIP PATTERNS<br />
IN UNASSIGNED MS/MS SPECTRA<br />
Aida Mrzic 1,2* , Wout Bittremieux 1,2 , Trung Nghia Vu 4 , Dirk Valkenborg 3,5,6 , Bart Goethals 1 & Kris Laukens 1,2 .<br />
Advanced Database Research and Modeling (ADReM), University of Antwerp 1 ; Biomedical informatics research center<br />
Antwerpen (biomina) 2 ; Flemish Institute for Technological Research (VITO), Mol 3 ; Karolinska Institutet, Stockholm 4 ;<br />
CFP, University of Antwerp 5 ; I-BioStat, Hasselt University 6 . * aida.mrzic@uantwerpen.be<br />
Tandem mass spectrometry (MS/MS) spectra generated in proteomics experiments often contain a large portion of<br />
unexplained peaks, despite continuous search engines improvements. Here we use pattern mining technique to determine<br />
the origin of these unassigned spectra. We discover patterns that indicate the presence of chimeric spectra and missed<br />
post-translational modifications (PTMs).<br />
INTRODUCTION<br />
Regardless of being a rich source of information, mass<br />
spectra acquired in mass spectrometry proteomics<br />
experiments often contain a significant number of<br />
unexplained peaks, or even remain completely<br />
unidentified. The unexplained fraction of mass spectra<br />
may come from low-quality or chimeric MS/MS spectra,<br />
or unexpected PTMs. To interpret the unexplained data,<br />
we propose a structured analysis of the peaks occurring in<br />
MS/MS spectra. We employ an unsupervised pattern<br />
mining technique (Naulaerts et al., 2013) to discover<br />
which peaks are associated with each other, and therefore<br />
are likely to have a common origin.<br />
METHODS<br />
Frequent itemset mining<br />
The technique we used to discover relationships between<br />
frequently co-occurring peaks in MS/MS data is frequent<br />
itemset mining, a class of data mining techniques that is<br />
specifically designed to discover co-occurring items in<br />
transactional datasets. The typical example of frequent<br />
itemset mining is the discovery of sets of products that are<br />
frequently bought together. Here, every set of products<br />
purchased together represents a single transaction, which<br />
results in a dataset consisting of a large number of<br />
supermarket basket transactions that can be mined for<br />
frequent patterns (Figure 1). In our approach a transaction<br />
consists of the mass differences between relevant peaks in<br />
the MS/MS spectrum.<br />
FIGURE 1. Frequent itemset mining principle.<br />
Mass differences associations<br />
In order to detect relationships between different types of<br />
mass spectrometry peaks, a distinction is made between<br />
peaks that were relevant for spectrum identification<br />
(assigned peaks) and peaks that were not used for the<br />
identification (unassigned peaks) (Vu et al., 2013). The<br />
mass differences between peaks (either assigned,<br />
unassigned, or both) are then calculated so that for each<br />
MS/MS spectrum in the dataset there is a single<br />
transaction consisting of all its mass differences.<br />
After obtaining these transactions for all MS/MS spectra<br />
in the dataset, frequent itemset mining can be employed to<br />
detect relationship patterns (Figure 2). These patterns can<br />
indicate previously unknown characteristics of the spectra,<br />
or even detect novel PTMs.<br />
FIGURE 2. Outline of the approach.<br />
RESULTS & DISCUSSION<br />
In order to evaluate our approach, we used MS/MS<br />
datasets from the PRoteomics IDEntifications (PRIDE)<br />
database (Vizcaino et al., 2013). This database contains a<br />
large number of publicly available datasets from massspectrometry-based<br />
proteomics experiments. However, the<br />
quality of the submitted datasets can be subject to a large<br />
variability, which makes it a proper candidate for our<br />
pattern mining approach.<br />
Preliminary results show that the detected patterns are able<br />
to capture valid information in a spectrum. The obtained<br />
patterns indicate peaks originating from the same peptide<br />
in case of chimeric spectra and mass differences<br />
originating from common PTMs.<br />
REFERENCES<br />
Naulaerts et al. Brief Bioinform, 16(2): 216–231 (<strong>2015</strong>).<br />
Vizcaino et al. Nucleic Acids Res, 41(D1):D1063-9 (2013).<br />
Vu et al. Proteome Science, 12:54 (2014).<br />
81