03.12.2015 Views

bbc 2015

BBC2015_booklet

BBC2015_booklet

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

BeNeLux Bioinformatics Conference – Antwerp, December 7-8 <strong>2015</strong><br />

Abstract ID: P<br />

Poster<br />

10th Benelux Bioinformatics Conference <strong>bbc</strong> <strong>2015</strong><br />

P19. MSQROB: AN R/BIOCONDUCTOR PACKAGE FOR ROBUST RELATIVE<br />

QUANTIFICATION IN LABEL-FREE MASS SPECTROMETRY-BASED<br />

QUANTITATIVE PROTEOMICS<br />

Ludger Goeminne 1,2,3* , Kris Gevaert 2,3 & Lieven Clement 1 .<br />

Department of Applied Mathematics, Computer Science and Statistics, Ghent University 1 ; VIB Medical Biotechnology<br />

Center 2 ; Department of Biochemistry, Ghent University 3 . * ludger.goeminne@UGent.be<br />

MSqRob is an R/Bioconductor package that uses robust ridge regression on peptide-level data for robust relative<br />

quantification of proteins in label-free data-dependent acquisition (DDA) mass spectrometry (MS)-based proteomic<br />

experiments. It has been shown that statistical methods inferring at the peptide-level outperform workflows that<br />

summarize peptide intensities prior to inference. MSqRob improves upon existing peptide-level methods by three<br />

modular extensions: (1) ridge regression, (2) empirical Bayes variance estimation and (3) M-estimation with Huber<br />

weights. The extensions make MSqRob less sensitive towards outliers and missing peptides, enabling more proteins to be<br />

processed. Our software provides streamlined data analysis pipelines for experiments with simple layouts as well as for<br />

more complex multi-factorial designs. Using a spike-in dataset, we illustrate that MSqRob grants more stable protein fold<br />

change estimates and improves the differential abundance (DA) ranking.<br />

INTRODUCTION<br />

In a typical label-free DDA LC-MS/MS-based proteomic<br />

workflow, proteins are digested to peptides, separated by<br />

RP-HPLC and analyzed by a mass spectrometer. However,<br />

several issues inherent to the protocol make data analysis<br />

non-trivial. Most of the common data analysis procedures<br />

use summarization-based workflows. We have previously<br />

shown that inference at the peptide level outperforms these<br />

summarization-based approaches (Goeminne et al., <strong>2015</strong>).<br />

However, even these pipelines are sensitive to outliers and<br />

suffer from overfitting. Here, we present MSqRob, an<br />

R/Bioconductor package that starts form peptide-level data<br />

and provides robust inference on DA at the protein level.<br />

METHODS<br />

Dataset. To demonstrate the performance of our package,<br />

we use the CPTAC dataset, in which 48 known human<br />

proteins were spiked-in at different concentrations in a<br />

yeast proteome background. Ideally, when comparing<br />

different spike-in conditions, only the human proteins<br />

should be flagged as differentially abundant.<br />

Competing analytical methods. MaxLFQ+Perseus,<br />

which summarizes peptide data followed by pairwise t-<br />

tests.<br />

LM model. Generally, peptide-based models are<br />

constructed as follows:<br />

y ijklmn<br />

= treat ij + pep ik + biorep il + techrep im<br />

+ ε ijklmn<br />

with y ijklmn the n th log 2 -transformed normalized feature<br />

intensity for the i th protein under the j th treatment treat ij ,<br />

the k th peptide sequence pep ik , the lth biological repeat<br />

biorep il and the m th technical repeat techrep im , and<br />

ε ijklmn a normally distributed error term with mean zero<br />

and variance σ i<br />

2 .<br />

MSqRob. MSqRob adds the following improvements to<br />

the LM model:<br />

1. Ridge regression: shrink parameter estimates<br />

towards 0 by adding a ridge penalty term to the<br />

loss function.<br />

2. Stabilize variance estimation by borrowing<br />

information across proteins with empirical<br />

Bayes (EB): shrink individual variances towards<br />

the pooled variance.<br />

3. M estimation with Huber weights: weigh down<br />

observations with large errors.<br />

RESULTS & DISCUSSION<br />

MSqRob uses MaxQuant or Mascot peptide-level data as<br />

input. It performs preprocessing, robust model fitting and<br />

returns log 2 fold change estimates and FDR corrected p-<br />

values for all model parameters and/or (user specified)<br />

contrasts. Advanced users have the flexibility to (a) adopt<br />

their own preprocessing pipeline (e.g. transformation,<br />

normalization, drop contaminants…) and (b) specify the<br />

appropriate model structure. Compared to competing<br />

methods, MSqRob returns more stable log 2 fold change<br />

estimates, improves DA ranking (Figure 1) and is able to<br />

discern between consistently strong DA and an accidental<br />

hit caused by outliers or a small variance due to random<br />

chance in low-abundant proteins.<br />

FIGURE 1. Receiver operating characteristic (ROC) curves showing the<br />

superior performance of MSqRob compared to a simple linear model<br />

(LM) and a summarizarion-based approach (MaxLFQ+Perseus) when<br />

comparing the lowest spike-in concentration 6A with the second lowest<br />

spike-in concentration 6B. Stars denote the methods’ cut off at an<br />

estimated 5 % FDR.<br />

REFERENCES<br />

Goeminne LJE et al. Journal of Proteome Research 14, 2457-2465<br />

(<strong>2015</strong>).<br />

63

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!