05.06.2013 Views

PNNL-13501 - Pacific Northwest National Laboratory

PNNL-13501 - Pacific Northwest National Laboratory

PNNL-13501 - Pacific Northwest National Laboratory

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Probabilistic Methods Development for Comprehensive Analysis of Complex Spectral and<br />

Chromatographic Data<br />

Study Control Number: PN00076/1483<br />

Kristin H. Jarman, Alan R. Willse, Karen L. Wahl, Jon H. Wahl<br />

Techniques such as chromatography, spectrometry, and spectroscopy are being used in both basic research to improve<br />

understanding of biological and biochemical processes, and in field portable instrumentation to perform rapid, on-site<br />

chemical analysis in a complex environment. Under this project, new data analysis methods are being developed for<br />

spectral and chromatographic data, providing algorithms that are automated, scientifically interpretable, and versatile<br />

enough to analyze spectral or chromatographic data under complex backgrounds or varying experimental conditions.<br />

Project Description<br />

The purpose of this project was to develop new methods<br />

for analysis of spectral or chromatographic data that apply<br />

to a wide variety of instrumentation, and can be readily<br />

adapted to either laboratory or field application. This<br />

research draws from modern probabilistic reasoning and<br />

statistics, where similarities between two data sets are<br />

quantified based on the properties of the peaks within the<br />

sets. The goal of this research was to advance the state of<br />

the art by providing data analysis methods that are<br />

automated, scientifically interpretable, and versatile<br />

enough to analyze spectral or chromatographic data under<br />

complex backgrounds or varying experimental conditions.<br />

Methods developed are initially being applied to and<br />

tested on specific problems in the area of matrix-assisted<br />

laser desorption/ionization (MALDI) spectrometry and<br />

gas chromatography.<br />

Introduction<br />

Analytical instrumentation techniques are evolving<br />

rapidly. Various techniques are being developed for basic<br />

research, while at the same time many of these techniques<br />

are being developed and deployed into field portable<br />

chemical or biological analysis tools. The tremendous<br />

potential of new and existing analytical techniques has<br />

resulted in rapid progress in the area of hardware and<br />

methods development. In a research setting, it is now<br />

possible to analyze sophisticated biochemical or physical<br />

processes. In a field development setting, it is possible to<br />

analyze samples in environments that generate complex,<br />

nonhomogeneous background signals. In both cases, the<br />

result is a complex spectrum and/or chromatogram.<br />

Current data analysis methods, such as principalcomponents-based<br />

methods (Seber 1984; Johnson and<br />

Wichern 1992) tend to place nearly all of their emphasis<br />

on peak intensities. This is appropriate for some<br />

applications but not others. For example, intensity-based<br />

methods are appropriate in gas chromatography, where<br />

quantitation of analytes is often of interest. On the other<br />

hand, mass spectrometry is often used to identify analytes<br />

present in a sample based on the locations of the peaks<br />

that appear. In this case, it can be difficult to obtain<br />

reproducible relative peak intensities, and therefore the<br />

presence or absence of peaks is of more interest than peak<br />

heights. In this case, intensity-based data analysis<br />

methods such as principal components analysis do not<br />

reliably capture the features of interest in a spectrum and<br />

a more subjective, manual interpretation tends to be used.<br />

Under this project, we developed a unified statistical<br />

framework through which many different types of<br />

analytical data can be interpreted. This approach<br />

incorporates recent research in both discrete and<br />

continuous multivariate statistical methods. This model is<br />

general enough to handle different types of analytical<br />

data, where different spectral features are of interest. In<br />

the first year of this project, methods for hypothesis<br />

testing and process control of spectral data were<br />

developed based on this model. A comparison against the<br />

more traditional methods demonstrates that this approach<br />

improved upon existing approaches.<br />

Approach<br />

The research under this project draws from a unique<br />

mathematical model for spectral and chromatographic<br />

data. The exact form of this multi-stage model is<br />

specified by the spectral features of interest for a<br />

particular application. Therefore, it is general enough to<br />

handle mass spectral data, where peak locations are of<br />

primary interest but intensities are not. It can also handle<br />

chromatographic data, where both peak locations and<br />

relative intensities may be of interest. In addition, this<br />

Statistics 451

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!