bbc 2015
BBC2015_booklet
BBC2015_booklet
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
BeNeLux Bioinformatics Conference – Antwerp, December 7-8 <strong>2015</strong><br />
Abstract ID: P<br />
Poster<br />
10th Benelux Bioinformatics Conference <strong>bbc</strong> <strong>2015</strong><br />
P19. MSQROB: AN R/BIOCONDUCTOR PACKAGE FOR ROBUST RELATIVE<br />
QUANTIFICATION IN LABEL-FREE MASS SPECTROMETRY-BASED<br />
QUANTITATIVE PROTEOMICS<br />
Ludger Goeminne 1,2,3* , Kris Gevaert 2,3 & Lieven Clement 1 .<br />
Department of Applied Mathematics, Computer Science and Statistics, Ghent University 1 ; VIB Medical Biotechnology<br />
Center 2 ; Department of Biochemistry, Ghent University 3 . * ludger.goeminne@UGent.be<br />
MSqRob is an R/Bioconductor package that uses robust ridge regression on peptide-level data for robust relative<br />
quantification of proteins in label-free data-dependent acquisition (DDA) mass spectrometry (MS)-based proteomic<br />
experiments. It has been shown that statistical methods inferring at the peptide-level outperform workflows that<br />
summarize peptide intensities prior to inference. MSqRob improves upon existing peptide-level methods by three<br />
modular extensions: (1) ridge regression, (2) empirical Bayes variance estimation and (3) M-estimation with Huber<br />
weights. The extensions make MSqRob less sensitive towards outliers and missing peptides, enabling more proteins to be<br />
processed. Our software provides streamlined data analysis pipelines for experiments with simple layouts as well as for<br />
more complex multi-factorial designs. Using a spike-in dataset, we illustrate that MSqRob grants more stable protein fold<br />
change estimates and improves the differential abundance (DA) ranking.<br />
INTRODUCTION<br />
In a typical label-free DDA LC-MS/MS-based proteomic<br />
workflow, proteins are digested to peptides, separated by<br />
RP-HPLC and analyzed by a mass spectrometer. However,<br />
several issues inherent to the protocol make data analysis<br />
non-trivial. Most of the common data analysis procedures<br />
use summarization-based workflows. We have previously<br />
shown that inference at the peptide level outperforms these<br />
summarization-based approaches (Goeminne et al., <strong>2015</strong>).<br />
However, even these pipelines are sensitive to outliers and<br />
suffer from overfitting. Here, we present MSqRob, an<br />
R/Bioconductor package that starts form peptide-level data<br />
and provides robust inference on DA at the protein level.<br />
METHODS<br />
Dataset. To demonstrate the performance of our package,<br />
we use the CPTAC dataset, in which 48 known human<br />
proteins were spiked-in at different concentrations in a<br />
yeast proteome background. Ideally, when comparing<br />
different spike-in conditions, only the human proteins<br />
should be flagged as differentially abundant.<br />
Competing analytical methods. MaxLFQ+Perseus,<br />
which summarizes peptide data followed by pairwise t-<br />
tests.<br />
LM model. Generally, peptide-based models are<br />
constructed as follows:<br />
y ijklmn<br />
= treat ij + pep ik + biorep il + techrep im<br />
+ ε ijklmn<br />
with y ijklmn the n th log 2 -transformed normalized feature<br />
intensity for the i th protein under the j th treatment treat ij ,<br />
the k th peptide sequence pep ik , the lth biological repeat<br />
biorep il and the m th technical repeat techrep im , and<br />
ε ijklmn a normally distributed error term with mean zero<br />
and variance σ i<br />
2 .<br />
MSqRob. MSqRob adds the following improvements to<br />
the LM model:<br />
1. Ridge regression: shrink parameter estimates<br />
towards 0 by adding a ridge penalty term to the<br />
loss function.<br />
2. Stabilize variance estimation by borrowing<br />
information across proteins with empirical<br />
Bayes (EB): shrink individual variances towards<br />
the pooled variance.<br />
3. M estimation with Huber weights: weigh down<br />
observations with large errors.<br />
RESULTS & DISCUSSION<br />
MSqRob uses MaxQuant or Mascot peptide-level data as<br />
input. It performs preprocessing, robust model fitting and<br />
returns log 2 fold change estimates and FDR corrected p-<br />
values for all model parameters and/or (user specified)<br />
contrasts. Advanced users have the flexibility to (a) adopt<br />
their own preprocessing pipeline (e.g. transformation,<br />
normalization, drop contaminants…) and (b) specify the<br />
appropriate model structure. Compared to competing<br />
methods, MSqRob returns more stable log 2 fold change<br />
estimates, improves DA ranking (Figure 1) and is able to<br />
discern between consistently strong DA and an accidental<br />
hit caused by outliers or a small variance due to random<br />
chance in low-abundant proteins.<br />
FIGURE 1. Receiver operating characteristic (ROC) curves showing the<br />
superior performance of MSqRob compared to a simple linear model<br />
(LM) and a summarizarion-based approach (MaxLFQ+Perseus) when<br />
comparing the lowest spike-in concentration 6A with the second lowest<br />
spike-in concentration 6B. Stars denote the methods’ cut off at an<br />
estimated 5 % FDR.<br />
REFERENCES<br />
Goeminne LJE et al. Journal of Proteome Research 14, 2457-2465<br />
(<strong>2015</strong>).<br />
63