bbc 2015
BBC2015_booklet
BBC2015_booklet
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
BeNeLux Bioinformatics Conference – Antwerp, December 7-8 <strong>2015</strong><br />
Abstract ID: P<br />
Poster<br />
10th Benelux Bioinformatics Conference <strong>bbc</strong> <strong>2015</strong><br />
P7. XCMS OPTIMISATION IN HIGH-THROUGHPUT LC-MS QC<br />
Charlie Beirnaert 1,2* , Matthias Cuykx 3 , Adrian Covaci 3 & Kris Laukens 1,2 .<br />
Advanced Database Research and Modeling (ADReM), University of Antwerp 1 ; Biomedical Informatics Research Centre<br />
Antwerp (biomina) 2 ; Toxicological Centre, University of Antwerp 3 . * charlie.beirnaert@uantwerpen.be<br />
In high-throughput untargeted metabolomics studies, quality control is still a prominent bottleneck. In analogy to a<br />
recently developed QC tool for proteomics, work in our research group aims to develop a QC environment specific for<br />
metabolomics. One component in this work is the XCMS analysis software for LC-MS data, which is very inputparameter-sensitive.<br />
The presented work deals with the automatic optimisation of the XCMS parameters by building<br />
further upon an existing framework for XCMS optimisation. The additions to this framework will be the inclusion of<br />
quantified resolution data by using the otherwise ignored profile-data and intelligent use of the isotopic profile of<br />
measured compounds.<br />
INTRODUCTION<br />
Metabolomics is the study of small molecules or<br />
metabolites. These metabolites have an enormous<br />
chemical diversity and are only now starting to be<br />
identified in a high-throughput fashion. Reason for this is<br />
the adoption of high performance liquid chromatography<br />
mass spectrometry and nuclear magnetic resonance<br />
spectroscopy. However, the data analysis of these large<br />
datasets is not trivial, specifically for LC-MS there are<br />
almost more ways of analysing data than there are<br />
researchers. Arguably, the most common used software<br />
platform for the initial analysis is XCMS (Smith et al.,<br />
2006). However, the output of XCMS is very dependent<br />
on the input-parameters. Often the default parameters are<br />
chosen or they are adapted to the intuition of the<br />
researcher, with no account of the introduction of false<br />
positives etc. Optimization algorithms have been<br />
constructed by using a dilution series (Eliasson et al.,<br />
2012) and by using the carbon isotope (Libiseller et al.,<br />
<strong>2015</strong>). In this work, we build further upon the latter by<br />
including quantified information from the profile m/z<br />
domain (the continuous data in the m/z dimension) where<br />
accurate resolutions can be obtained for the mono-isotopic<br />
peaks and other isotopes. The developed optimisation can<br />
be used for both the data analysis and the quality control<br />
framework that is under development.<br />
METHODS<br />
The proposed work uses XCMS to find the peaks of<br />
interest in the data. To optimise this process, the results<br />
from XCMS are analysed for the occurrence of peaks and<br />
their isotopes. In this step, the raw profile data is inspected<br />
around the, by XCMS, identified peaks for the<br />
quantification of the peak resolution and for the<br />
occurrence of missed isotopes.<br />
Centroid vs Profile data: Modern day MS specialists use<br />
centroid data because the file size is considerably lower.<br />
The mass spectrometer converts the continuous data in the<br />
m/z dimension to a collection of spikes where each<br />
approximately Gaussian peak is converted to a single<br />
spike (delta function with the same height as the original<br />
peak). All other data is discarded. The result is a huge<br />
reduction in the file size but a loss of the peak shape and,<br />
as a result, no quantification of the resolution is possible.<br />
Optimization parameter: The peaks and their isotopes<br />
are characterized by a Gaussian in the chromatographic<br />
dimension and spaced apart by 1.0063 Da in the m/z<br />
dimension. When an isotope is missing or the extracted<br />
peak does not appear in enough samples (for example in<br />
50% of the samples in the sample group), the peak is<br />
categorized as “unreliable”. When a peak is present in all<br />
samples or has a clear isotopic distribution it is considered<br />
as “reliable”. With these measures a so called peak picking<br />
score can be calculated, which in turn can be optimised by<br />
a variety of methods. This results in an increase in reliable<br />
peaks, while not increasing false positives.<br />
Analysis & Quality control: The optimisation of the<br />
XCMs parameters is useful both in the analysis of the data<br />
itself, but it is also applicable in quality control for large<br />
scale LC-MS experiments. By being able to quantify the<br />
resolutions of all relevant peaks in a dataset corresponding<br />
to a control sample, it is possible to monitor the quality of<br />
spectra, and when combining this with other QC<br />
frameworks, like iMonDB (Bittremieux et al., <strong>2015</strong>) it is<br />
possible to assure the quality of all experiments in a long<br />
lasting study.<br />
RESULTS & DISCUSSION<br />
The aim is to use the profile data to improve the available<br />
optimization algorithms available. It remains to be seen<br />
whether the extra information in this data (compared to<br />
centroid data) justifies the increased need of computer<br />
resources. Nonetheless, profile data provides a valuable<br />
contribution in LC-MS optimization, because it enables<br />
researchers to evaluate (quantitatively) and improve the<br />
m/z resolution.<br />
REFERENCES<br />
Smith CA et al. Anal. Chem. 78(3), 779-789, (2006).<br />
Eliasson M. et al. Anal. Chem. 84(15), 6869-6876, (2012).<br />
Libiseller G. et al. BMC Bioinformatics 16:118, (<strong>2015</strong>).<br />
Bittremieux W. et al. J. Proteome Res. 14(5), 2360-2366, (<strong>2015</strong>).<br />
51