08.02.2013 Views

New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.3. PREPROCESSING 31<br />

mix (external standard) used are important <strong>for</strong> reliable results. Not only<br />

<strong>the</strong> zero-point <strong>of</strong> <strong>the</strong> particular machine is determined, but also machine<br />

specific trans<strong>for</strong>mations (machine dependent constants, see <strong>for</strong> example<br />

Equation 3.2.4). Once <strong>the</strong>se parameters are determined correctly<br />

<strong>the</strong> measurement errors are corrected automatically during acquisition<br />

within <strong>the</strong> machines hardware.<br />

Shifts in intensity direction (I) A biological sample (e.g. blood) automatically<br />

undergoes some changes in its biochemical content, mainly<br />

caused by proteases (see section 4.1.1). Fur<strong>the</strong>rmore, when mixed with<br />

<strong>the</strong> so-called matrix, a quite inhomogeneous mixture <strong>for</strong>ms, which <strong>the</strong>n<br />

becomes <strong>the</strong> final sample, put into <strong>the</strong> MS machine. Within this inhomogeneous<br />

sample <strong>the</strong>re exists so-called sweet spots where <strong>the</strong> density <strong>of</strong><br />

proteins is much higher than average. If now <strong>the</strong> laser beam hits <strong>the</strong>se<br />

sweet spots, <strong>the</strong> intensity <strong>of</strong> <strong>the</strong>se molecules increases excessively.<br />

3.3.3 Our Approach<br />

In <strong>the</strong> following, we describe <strong>the</strong> algorithms we have developed (or modified<br />

to fit our needs) to step-by-step recover <strong>the</strong> original signals available in <strong>the</strong><br />

sample put into <strong>the</strong> MS machine. The overall procedure is as follows:<br />

Algorithm 1 Preprocessing<br />

Require: Raw Spectrum as x, y value pairs<br />

Apply wavelet-based de-noising<br />

Apply tophat-based baseline reduction<br />

Apply normalization<br />

return Preprocessed spectrum<br />

3.3.4 Denoising<br />

Denoising <strong>the</strong> raw data X tries to remove whatever noise (ɛ) is present in X<br />

while retaining whatever signal S is present. (Note baseline removal baseline<br />

is handled separately - see section 3.3.5.) This is not to be confused with<br />

smoothing which removes high frequencies present in <strong>the</strong> data and retains<br />

low ones opposed to denoising which attempts to remove whatever noise is<br />

present. Denoising generally yields better results in subsequent steps <strong>of</strong> <strong>the</strong><br />

analysis workflow, since some general assumptions about smoothness can be<br />

taken. However, in practice most <strong>of</strong> <strong>the</strong> noise in MALDI-TOF spectra is<br />

indeed contained in <strong>the</strong> high-frequency component <strong>of</strong> a spectrum. This is<br />

mainly due to a number <strong>of</strong> factors, such as electrical inference, random ion<br />

motions, statistical fluctuation in <strong>the</strong> detector gain or chemical impurities (see<br />

e.g. (Shin et al., 2007)). There are several (heuristic) approaches <strong>for</strong> noise<br />

reduction in <strong>the</strong> literature such as moving average filters (Liu, Krishnapuram,<br />

Pratapa, Liao, Hartemink and Carin, 2003), Gaussian kernel filters (Wang,<br />

Howard, Campa, Patz and Fitzgerald, 2003), or PCA (Sta<strong>the</strong>ropoulos et al.,<br />

1999). However, most <strong>of</strong> <strong>the</strong>se (parametric) noise reduction approaches have<br />

been established based on empirical insights and <strong>the</strong> parameters need to be<br />

fine-tuned to make <strong>the</strong> method work properly - this is always case sensitive<br />

and time consuming.<br />

Our studies have shown that <strong>the</strong> approaches mentioned above are not very<br />

well suited <strong>for</strong> reducing noise in MALDI/SELDI-TOF spectra: <strong>for</strong> example,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!