bbc 2015
BBC2015_booklet
BBC2015_booklet
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
BeNeLux Bioinformatics Conference – Antwerp, December 7-8 <strong>2015</strong><br />
Abstract ID: P<br />
Poster<br />
10th Benelux Bioinformatics Conference <strong>bbc</strong> <strong>2015</strong><br />
P66. PLADIPUS EMPOWERS UNIVERSAL DISTRIBUTED COMPUTING<br />
Kenneth Verheggen 1,2,3* , Harald Barsnes 4,5 , Lennart Martens 1,2,3 & Marc Vaudel 4 .<br />
Medical Biotechnology Center, VIB, Ghent, Belgium 1 ; Department of Biochemistry, Ghent University, Ghent 2 ;<br />
Belgium,Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium 3 ; Proteomics Unit, Department of<br />
Biomedicine, University of Bergen, Norway 4 ; KG Jebsen Center for Diabetes Research, Department of Clinical Science,<br />
University of Bergen, Norway 5 . *kenneth.verheggen@vib-ugent.be<br />
The use of proteomics bioinformatics substantially contributes to an improved understanding of proteomes, but this novel<br />
and in-depth knowledge comes at the cost of increased computational complexity. Parallelization across multiple<br />
computers, a strategy termed distributed computing, can be used to handle this increased complexity. However, setting<br />
up and maintaining a distributed computing infrastructure requires resources and skills that are not readily available to<br />
most research groups.<br />
Here, we propose a free and open source framework named Pladipus that greatly facilitates the establishment of<br />
distributed computing networks for proteomics bioinformatics tools.<br />
INTRODUCTION<br />
Various modern day bioinformatics-related fields have a<br />
growing focus on large scale data processing. This<br />
inevitably leads to an increased complexity, as is<br />
illustrated by the recent efforts to elaborate a<br />
comprehensive MS-based human proteome<br />
characterization (Kim et al., 2014; Wilhelm et al., 2014).<br />
Such high-throughput, complex studies are becoming<br />
increasingly popular, but require high performance<br />
computational setups in order to be analyzed swiftly.<br />
METHODS<br />
Here, we present a generic platform for distributed<br />
proteomics software, called Pladipus. It provides an<br />
end-user-oriented solution to distribute<br />
bioinformatics tasks over a network of computers,<br />
managed through an intuitive graphical user interface<br />
(GUI).<br />
Pladipus comes with several modules that work out<br />
of the box. They include SearchGUI (Vaudel et al.,<br />
2011), PeptideShaker (Vaudel et al., <strong>2015</strong>),<br />
DeNovoGUI (Muth et al., 2014), MsConvert (part of<br />
Proteowizard (Kessner et al., 2008)) and three<br />
common forms of the BLAST (Altschul et al., 1990)<br />
algorithm (blastn, blastp and blastx). It is possible to<br />
link these together to set up tailored pipelines for<br />
specific needs, including custom, in-house<br />
algorithms and execute the whole on an inexpensive,<br />
scalable cluster infrastructure without additional cost<br />
or expert maintenance requirement. It can even be set<br />
up to allow existing (idle) hardware to hook into the<br />
network and participate in the processing.<br />
RESULTS & DISCUSSION<br />
To numerically assess the benefits of using a distributed<br />
computing framework, 52 CPTAC experiments (LTQ-<br />
Study6 : Orbitrap@86) (Paulovich et al., 2010) were<br />
searched three times against a protein sequence database<br />
(UniProtKB/SwissProt (release-<strong>2015</strong>_05)) on Pladipus<br />
networks of various. A selection of three search engines<br />
was applied: X!Tandem, Tide and MS-GF+. As expected<br />
for a distributed system, the wall time is very reproducible<br />
and decreased nearly exponentially with the number of<br />
workers.<br />
FIGURE 1. Benchmarking of a Pladipus network<br />
(16GB ram, 12cores, 250GB disk space, Ubuntu<br />
precise)<br />
Pladipus is freely available as open<br />
source under the permissive Apache2<br />
license. Documentation, including<br />
example files, an installer and a video tutorial, can be<br />
found at<br />
https://compomics.github.io/projects/pladipus.html.<br />
REFERENCES<br />
Altschul,S.F. et al. (1990) Basic local alignment search tool. J. Mol.<br />
Biol., 215, 403–10.<br />
Kessner,D. et al. (2008) ProteoWizard: open source software for rapid<br />
proteomics tools development. Bioinformatics, 24, 2534–6.<br />
Kim,M.-S. et al. (2014) A draft map of the human proteome. Nature,<br />
509, 575–81.<br />
Muth,T. et al. (2014) DeNovoGUI: an open source graphical user<br />
interface for de novo sequencing of tandem mass spectra. J.<br />
Proteome Res., 13, 1143–6.<br />
Paulovich,A.G. et al. (2010) Interlaboratory study characterizing a yeast<br />
performance standard for benchmarking LC-MS platform<br />
performance. Mol. Cell. Proteomics, 9, 242–54.<br />
Vaudel,M. et al. (<strong>2015</strong>) PeptideShaker enables reanalysis of MS-derived<br />
proteomics data sets. Nat. Biotechnol., 33, 22–24.<br />
Vaudel,M. et al. (2011) SearchGUI: An open-source graphical user<br />
interface for simultaneous OMSSA and X!Tandem searches.<br />
Proteomics, 11, 996–9.<br />
Wilhelm,M. et al. (2014) Mass-spectrometry-based draft of the human<br />
proteome. Nature, 509, 582–7.<br />
110