03.12.2015 Views

bbc 2015

BBC2015_booklet

BBC2015_booklet

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

BeNeLux Bioinformatics Conference – Antwerp, December 7-8 <strong>2015</strong><br />

Abstract ID: P<br />

Poster<br />

10th Benelux Bioinformatics Conference <strong>bbc</strong> <strong>2015</strong><br />

P7. XCMS OPTIMISATION IN HIGH-THROUGHPUT LC-MS QC<br />

Charlie Beirnaert 1,2* , Matthias Cuykx 3 , Adrian Covaci 3 & Kris Laukens 1,2 .<br />

Advanced Database Research and Modeling (ADReM), University of Antwerp 1 ; Biomedical Informatics Research Centre<br />

Antwerp (biomina) 2 ; Toxicological Centre, University of Antwerp 3 . * charlie.beirnaert@uantwerpen.be<br />

In high-throughput untargeted metabolomics studies, quality control is still a prominent bottleneck. In analogy to a<br />

recently developed QC tool for proteomics, work in our research group aims to develop a QC environment specific for<br />

metabolomics. One component in this work is the XCMS analysis software for LC-MS data, which is very inputparameter-sensitive.<br />

The presented work deals with the automatic optimisation of the XCMS parameters by building<br />

further upon an existing framework for XCMS optimisation. The additions to this framework will be the inclusion of<br />

quantified resolution data by using the otherwise ignored profile-data and intelligent use of the isotopic profile of<br />

measured compounds.<br />

INTRODUCTION<br />

Metabolomics is the study of small molecules or<br />

metabolites. These metabolites have an enormous<br />

chemical diversity and are only now starting to be<br />

identified in a high-throughput fashion. Reason for this is<br />

the adoption of high performance liquid chromatography<br />

mass spectrometry and nuclear magnetic resonance<br />

spectroscopy. However, the data analysis of these large<br />

datasets is not trivial, specifically for LC-MS there are<br />

almost more ways of analysing data than there are<br />

researchers. Arguably, the most common used software<br />

platform for the initial analysis is XCMS (Smith et al.,<br />

2006). However, the output of XCMS is very dependent<br />

on the input-parameters. Often the default parameters are<br />

chosen or they are adapted to the intuition of the<br />

researcher, with no account of the introduction of false<br />

positives etc. Optimization algorithms have been<br />

constructed by using a dilution series (Eliasson et al.,<br />

2012) and by using the carbon isotope (Libiseller et al.,<br />

<strong>2015</strong>). In this work, we build further upon the latter by<br />

including quantified information from the profile m/z<br />

domain (the continuous data in the m/z dimension) where<br />

accurate resolutions can be obtained for the mono-isotopic<br />

peaks and other isotopes. The developed optimisation can<br />

be used for both the data analysis and the quality control<br />

framework that is under development.<br />

METHODS<br />

The proposed work uses XCMS to find the peaks of<br />

interest in the data. To optimise this process, the results<br />

from XCMS are analysed for the occurrence of peaks and<br />

their isotopes. In this step, the raw profile data is inspected<br />

around the, by XCMS, identified peaks for the<br />

quantification of the peak resolution and for the<br />

occurrence of missed isotopes.<br />

Centroid vs Profile data: Modern day MS specialists use<br />

centroid data because the file size is considerably lower.<br />

The mass spectrometer converts the continuous data in the<br />

m/z dimension to a collection of spikes where each<br />

approximately Gaussian peak is converted to a single<br />

spike (delta function with the same height as the original<br />

peak). All other data is discarded. The result is a huge<br />

reduction in the file size but a loss of the peak shape and,<br />

as a result, no quantification of the resolution is possible.<br />

Optimization parameter: The peaks and their isotopes<br />

are characterized by a Gaussian in the chromatographic<br />

dimension and spaced apart by 1.0063 Da in the m/z<br />

dimension. When an isotope is missing or the extracted<br />

peak does not appear in enough samples (for example in<br />

50% of the samples in the sample group), the peak is<br />

categorized as “unreliable”. When a peak is present in all<br />

samples or has a clear isotopic distribution it is considered<br />

as “reliable”. With these measures a so called peak picking<br />

score can be calculated, which in turn can be optimised by<br />

a variety of methods. This results in an increase in reliable<br />

peaks, while not increasing false positives.<br />

Analysis & Quality control: The optimisation of the<br />

XCMs parameters is useful both in the analysis of the data<br />

itself, but it is also applicable in quality control for large<br />

scale LC-MS experiments. By being able to quantify the<br />

resolutions of all relevant peaks in a dataset corresponding<br />

to a control sample, it is possible to monitor the quality of<br />

spectra, and when combining this with other QC<br />

frameworks, like iMonDB (Bittremieux et al., <strong>2015</strong>) it is<br />

possible to assure the quality of all experiments in a long<br />

lasting study.<br />

RESULTS & DISCUSSION<br />

The aim is to use the profile data to improve the available<br />

optimization algorithms available. It remains to be seen<br />

whether the extra information in this data (compared to<br />

centroid data) justifies the increased need of computer<br />

resources. Nonetheless, profile data provides a valuable<br />

contribution in LC-MS optimization, because it enables<br />

researchers to evaluate (quantitatively) and improve the<br />

m/z resolution.<br />

REFERENCES<br />

Smith CA et al. Anal. Chem. 78(3), 779-789, (2006).<br />

Eliasson M. et al. Anal. Chem. 84(15), 6869-6876, (2012).<br />

Libiseller G. et al. BMC Bioinformatics 16:118, (<strong>2015</strong>).<br />

Bittremieux W. et al. J. Proteome Res. 14(5), 2360-2366, (<strong>2015</strong>).<br />

51

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!