03.12.2015 Views

bbc 2015

BBC2015_booklet

BBC2015_booklet

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

BeNeLux Bioinformatics Conference – Antwerp, December 7-8 <strong>2015</strong><br />

Abstract ID: P<br />

Poster<br />

10th Benelux Bioinformatics Conference <strong>bbc</strong> <strong>2015</strong><br />

P37. ANALYSIS OF RELATIONSHIP PATTERNS<br />

IN UNASSIGNED MS/MS SPECTRA<br />

Aida Mrzic 1,2* , Wout Bittremieux 1,2 , Trung Nghia Vu 4 , Dirk Valkenborg 3,5,6 , Bart Goethals 1 & Kris Laukens 1,2 .<br />

Advanced Database Research and Modeling (ADReM), University of Antwerp 1 ; Biomedical informatics research center<br />

Antwerpen (biomina) 2 ; Flemish Institute for Technological Research (VITO), Mol 3 ; Karolinska Institutet, Stockholm 4 ;<br />

CFP, University of Antwerp 5 ; I-BioStat, Hasselt University 6 . * aida.mrzic@uantwerpen.be<br />

Tandem mass spectrometry (MS/MS) spectra generated in proteomics experiments often contain a large portion of<br />

unexplained peaks, despite continuous search engines improvements. Here we use pattern mining technique to determine<br />

the origin of these unassigned spectra. We discover patterns that indicate the presence of chimeric spectra and missed<br />

post-translational modifications (PTMs).<br />

INTRODUCTION<br />

Regardless of being a rich source of information, mass<br />

spectra acquired in mass spectrometry proteomics<br />

experiments often contain a significant number of<br />

unexplained peaks, or even remain completely<br />

unidentified. The unexplained fraction of mass spectra<br />

may come from low-quality or chimeric MS/MS spectra,<br />

or unexpected PTMs. To interpret the unexplained data,<br />

we propose a structured analysis of the peaks occurring in<br />

MS/MS spectra. We employ an unsupervised pattern<br />

mining technique (Naulaerts et al., 2013) to discover<br />

which peaks are associated with each other, and therefore<br />

are likely to have a common origin.<br />

METHODS<br />

Frequent itemset mining<br />

The technique we used to discover relationships between<br />

frequently co-occurring peaks in MS/MS data is frequent<br />

itemset mining, a class of data mining techniques that is<br />

specifically designed to discover co-occurring items in<br />

transactional datasets. The typical example of frequent<br />

itemset mining is the discovery of sets of products that are<br />

frequently bought together. Here, every set of products<br />

purchased together represents a single transaction, which<br />

results in a dataset consisting of a large number of<br />

supermarket basket transactions that can be mined for<br />

frequent patterns (Figure 1). In our approach a transaction<br />

consists of the mass differences between relevant peaks in<br />

the MS/MS spectrum.<br />

FIGURE 1. Frequent itemset mining principle.<br />

Mass differences associations<br />

In order to detect relationships between different types of<br />

mass spectrometry peaks, a distinction is made between<br />

peaks that were relevant for spectrum identification<br />

(assigned peaks) and peaks that were not used for the<br />

identification (unassigned peaks) (Vu et al., 2013). The<br />

mass differences between peaks (either assigned,<br />

unassigned, or both) are then calculated so that for each<br />

MS/MS spectrum in the dataset there is a single<br />

transaction consisting of all its mass differences.<br />

After obtaining these transactions for all MS/MS spectra<br />

in the dataset, frequent itemset mining can be employed to<br />

detect relationship patterns (Figure 2). These patterns can<br />

indicate previously unknown characteristics of the spectra,<br />

or even detect novel PTMs.<br />

FIGURE 2. Outline of the approach.<br />

RESULTS & DISCUSSION<br />

In order to evaluate our approach, we used MS/MS<br />

datasets from the PRoteomics IDEntifications (PRIDE)<br />

database (Vizcaino et al., 2013). This database contains a<br />

large number of publicly available datasets from massspectrometry-based<br />

proteomics experiments. However, the<br />

quality of the submitted datasets can be subject to a large<br />

variability, which makes it a proper candidate for our<br />

pattern mining approach.<br />

Preliminary results show that the detected patterns are able<br />

to capture valid information in a spectrum. The obtained<br />

patterns indicate peaks originating from the same peptide<br />

in case of chimeric spectra and mass differences<br />

originating from common PTMs.<br />

REFERENCES<br />

Naulaerts et al. Brief Bioinform, 16(2): 216–231 (<strong>2015</strong>).<br />

Vizcaino et al. Nucleic Acids Res, 41(D1):D1063-9 (2013).<br />

Vu et al. Proteome Science, 12:54 (2014).<br />

81

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!