bbc 2015

Recommendations

Info

BeNeLux Bioinformatics Conference – Antwerp, December 7-8 2015 Abstract ID: P Poster 10th Benelux Bioinformatics Conference bbc 2015 P48. NGOME: PREDICTION OF NON-ENZYMATIC PROTEIN DEAMIDATION FROM SEQUENCE-DERIVED SECONDARY STRUCTURE AND INTRINSIC DISORDER J. Ramiro Lorenzo 1 , Leonardo G. Alonso 2 & Ignacio E. Sánchez 1* . Protein Physiology Laboratory, Facultad de Ciencias Exactas y Naturales and IQUIBICEN - CONICET, Universidad de Buenos Aires, Argentina 1 ; Protein Structure-Function and Engineering Laboratory, Fundación Instituto Leloir and IIBBA - CONICET, Buenos Aires, Argentina 2 . *isanchez@qb.fcen.uba.ar Asparagine residues in proteins undergo spontaneous deamidation, a post-translational modification that may act as a molecular clock for the regulation of protein function and turnover. Asparagine deamidation is modulated by protein local sequence, secondary structure and hydrogen bonding. We present NGOME, an algorithm able to predict non - enzymatic deamidation of internal asparagine residues in proteins, in the absence of structural data, from sequence based predictions of secondary structure and intrinsic disorder. NGOME may help the user identify deamidation-prone asparagine residues, often related to protein gain of function, protein degradation or protein misfolding in pathological processes. INTRODUCTION Protein deamidation is a post-translational modification in which the side chain amide group of a glutamine or asparagine (Asn) residue is transformed into an acidic carboxylate group. Deamidation often, but not always, leads to loss of protein function 1,2 . Deamidation rates in proteins vary widely, with halftimes for particular Asn residues ranging from several days to years. In contrast with the ubiquity and importance of Asn deamidation, there is currently no publicly available algorithm for the prediction of Asn deamidation A structure-based algorithm was published 3 , but is no longer available online and is not useful for proteins of unknown structure or those that are intrinsically disordered. METHODS Dataset. We collected from the literature experimental reports of deamidation of Asn residues in proteins using mass spectrometry or Edman sequencing. Since deamidation rates depend strongly on pH and temperature, we only included experiments at neutral or slightly basic pH and up to 313K. An Asn residue was considered a positive if unequivocal change to aspartic or isoaspartic residue was observed. Asn residues for which direct experimental evidence was not obtained were not taken into account. NGOME training. We trained the algorithm by randomly splitting the dataset into training and test sets 100 times, while keeping a similar number of positive and negative Asn-Xaa dipeptides in the two sets. For each splitting, we selected the weights for disorder 4 and alpha helix prediction 5 in NGOME algorithm to maximize the area under the ROC curve for the training set. For the test set, the area under the ROC curve for NGOME was larger than for sequence-based prediction 97 out of 100 times. Finally, we selected the average values of weights for NGOME. RESULTS & DISCUSSION Both protein sequence and structure can influence Asn deamidation kinetics. In the absence of secondary and 5. Cole, C., et al. Nucleic Acids Res 36:W197-201 (2008). tertiary structure, Asn deamidation rates are governed by the identity of the N+1 amino acid 3 . In model peptides, the Asn-Gly dipeptide is by far the fastest to deamidate, with bulky N+1 side chains generally slowing down the reaction. Several structural features decreasing Asn deamidation rates have also been identified, including alpha helix formation and hydrogen bond formation by the Asn side chain, the N+1 backbone amide and the neighbouring residues 3 . We compiled a database of 281 Asn residues (67 positives and 214 negatives) in 39 proteins to train NGOME. We computed t50 for all Asn in the dataset and generated a ROC curve by considering as positives Asn residues with different values of t50. The area under the ROC curve is larger for the NGOME predictions (0.9640) than for the sequence-based predictions (0.9270) (p-value 6×10 -3 ). NGOME also performs better for threshold value s yielding few false positives. NGOME can also discriminate between positive and negative Asn-Gly dipeptides whereas sequence-based prediction can not. The area under the ROC curve is 0.7051 for the NGOME predictions, larger than the random value of 0.5 for sequence-based prediction (p-value 9×10 –3 ). Since NGOME requires only a protein sequence as an input and not a three-dimensional structure, we envision that GNOME will be useful to systematically evaluate whole proteome data and in the study of intrinsically disordered proteins for which the structural data is scarce. NGOME is freely available as a webserver at the National EMBnet node Argentina, URL: http://www.embnet.qb.fcen.uba.ar/ in the subpage “Protein and nucleic acid structure and sequence analysis”. REFERENCES 1. Curnis, F., et al. J Biol Chem 281:36466-36476 (2006). 2. Reissner, K.J. and Aswad, D.W. Cell Mol Life Sci 60:1281 -1295 (2003). 3. Robinson, N.E. and Robinson, A.B. Proc Natl Acad Sci U S A 98:4367-4372 (2001). 4. Dosztanyi, Z., et al. Bioinformatics 21:3433-3434 (2005). 92
BeNeLux Bioinformatics Conference – Antwerp, December 7-8 2015 Abstract ID: P Poster 10th Benelux Bioinformatics Conference bbc 2015 P49. OPTIMAL DESIGN OF SRM ASSAYS USING MODULAR EMPIRICAL MODELS Jérôme Renaux 1,* , Alexandros Sarafianos 1 , Kurt De Grave 1 & Jan Ramon 1 . Department of Computer Science, KU Leuven. 1 * Jerome.renaux@cs.kuleuven.be Targeted proteomics techniques such as Selected Reaction Monitoring (SRM) have become very popular for protein quantification due to their high sensitivity and reproducibility. However, these rely on the selection of optimal transitions, which are not always known in advance and may require expensive and time-consuming discovery experiments to identify. We propose a computer program for the automated identification of optimal transitions using machine learning and show encouraging results when compared to a widely used spectral library. INTRODUCTION A major issue with both SRM is to know which transitions to monitor in order to maximally detect a specific protein, these being different from one protein to another. Good candidates are transitions whose chemical properties will make them likely to occur and easy to detect by the mass spectrometer, while being sufficiently specific indicators of their parent protein. Traditionally, targeted proteomics assays, which consist of lists of ions or transitions to monitor, are designed through costly exploratory experiments. Recently, attempts have been made to produce software to help design optimal assays. These efforts rely on some extent on collaborative databases of mass spectra which are mined to identify the best possible peptides to include in the assays. While successful, these approaches still depend on past exploratory analyses and on the coverage of the exploited databases. Therefore, their performance decrease in cases where such databases cannot be leveraged, such as when dealing with little-studied organisms or rare, lowabundance proteins. We propose an approach called SIMPOPE (Sequence of Inductive Models for the Prediction and Optimization of Proteomics Experiments) that models all the steps of the typical tandem mass spectrometry (MS/MS) workflow in order to accurately predict the properties of peptide and fragment ions within a given proteome, and subsequently identify optimal assays among them. METHODS SIMPOPE consists of a sequential suite of predictive models for each step of the MS/MS workflow. It exploits knowledge from public databases and combines it with the generalizing power of machine learning models to compensate for noisy or missing data. All models are probabilistic, allowing to keep track of the inherent uncertainty of the successive predictions and to weight the results accordingly for the assay prediction. Enzymatic cleavage is modelled using CP-DT(Fannes et al., 2013), which models the behaviour of the trypsin enzyme using random forests. Retention time prediction is achieved using the Elude tool from the Percolator suite (Moruz et al., 2010). The charge distribution of electrospray precursor ions is also modelled using random forests trained on experimental data mined from PRIDE (Vizcaino et al., 2013). Fragmentation patterns and product ion intensity are predicted with the help of random forest models trained on MS-LIMS data (Degroeve & Martens 2013; De Grave et al., 2014). Finally, prior knowledge about the abundance of proteins within a given proteome is incorporated as prior probabilities, obtained when available from PaxDB. On the human proteome, these steps yield a total of 321 000 000 transitions together with their relevant chemical properties. We then compute a score for every single transition, based on these properties and on their aliasing with other transitions in terms of Q1 and Q3 m/z. RESULTS & DISCUSSION We validated our approach by computing scores for 2000 reference transitions from the SRMAtlas database (Picotti et al., 2014). Based on these scores, we can rank the reference transitions among all possible transitions. Intuitively, reference transitions should rank high, and therefore have a low rank (ideally, in the top five). Based on the average number of transitions per protein in our reference set, a perfect median rank would be 3.2, while a totally random scoring system should yield a median rank of 151. The approach we propose achieved a median rank of 15, signifying that using our scoring method, 50% of the reference transitions are ranked in the top 15. This result is encouraging as it shows that the scores predicted by SIMPOPE do correlate with the quality of the transitions. We can subsequently use that score as a feature to train an additional model on top of the ones described here to refine the assay prediction process (further results on the poster). REFERENCES Degroeve, S. & Martens, L. MS2PIP: a tool for MS/MS peak intensity prediction. Bioinformatics, 29, pp.3199–203 (2013). Fannes, T. et al. Journal of Proteome Research, 12(5), pp.2253–2259 (2013). De Grave, K. De et al. Prediction of peptide fragment ion intensity : a priori partitioning reconsidered. International Mass Spectrometry Conference 2014, (2014). Moruz, L., Tomazela, D. & Käll, L. Training, selection, and robust calibration of retention time models for targeted proteomics. Journal of Proteome Research, 9(10), pp.5209–5216 (2010). Picotti, P. et al. A complete mass-spectrometric map of the yeast proteome applied to quantitative trait analysis. Nature, 494(7436), pp.266–270 (2014). Vizcaino, J. a. et al. The Proteomics Identifications (PRIDE) database and associated tools: status in 2013. Nucleic Acids Research, 41(D1), pp.D1063–D1069 (2013). 93
Page 1 and 2:
10 th Benelux Bioinformatics Confer
Page 3 and 4:
10th Benelux Bioinformatics Confere
Page 5 and 6:
Page 7 and 8:
Page 9 and 10:
Page 11 and 12:
Page 13 and 14:
Page 15 and 16:
Page 17 and 18:
Page 19 and 20:
BeNeLux Bioinformatics Conference -
Page 21 and 22:
Page 23 and 24:
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42: BeNeLux Bioinformatics Conference -
Page 91: BeNeLux Bioinformatics Conference -
Page 115: 10th Benelux Bioinformatics Confere
show all

bbc 2015

Create successful ePaper yourself

Delete template?

Save as template?