New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
3.3. PREPROCESSING 31<br />
mix (external standard) used are important <strong>for</strong> reliable results. Not only<br />
<strong>the</strong> zero-point <strong>of</strong> <strong>the</strong> particular machine is determined, but also machine<br />
specific trans<strong>for</strong>mations (machine dependent constants, see <strong>for</strong> example<br />
Equation 3.2.4). Once <strong>the</strong>se parameters are determined correctly<br />
<strong>the</strong> measurement errors are corrected automatically during acquisition<br />
within <strong>the</strong> machines hardware.<br />
Shifts in intensity direction (I) A biological sample (e.g. blood) automatically<br />
undergoes some changes in its biochemical content, mainly<br />
caused by proteases (see section 4.1.1). Fur<strong>the</strong>rmore, when mixed with<br />
<strong>the</strong> so-called matrix, a quite inhomogeneous mixture <strong>for</strong>ms, which <strong>the</strong>n<br />
becomes <strong>the</strong> final sample, put into <strong>the</strong> MS machine. Within this inhomogeneous<br />
sample <strong>the</strong>re exists so-called sweet spots where <strong>the</strong> density <strong>of</strong><br />
proteins is much higher than average. If now <strong>the</strong> laser beam hits <strong>the</strong>se<br />
sweet spots, <strong>the</strong> intensity <strong>of</strong> <strong>the</strong>se molecules increases excessively.<br />
3.3.3 Our Approach<br />
In <strong>the</strong> following, we describe <strong>the</strong> algorithms we have developed (or modified<br />
to fit our needs) to step-by-step recover <strong>the</strong> original signals available in <strong>the</strong><br />
sample put into <strong>the</strong> MS machine. The overall procedure is as follows:<br />
Algorithm 1 Preprocessing<br />
Require: Raw Spectrum as x, y value pairs<br />
Apply wavelet-based de-noising<br />
Apply tophat-based baseline reduction<br />
Apply normalization<br />
return Preprocessed spectrum<br />
3.3.4 Denoising<br />
Denoising <strong>the</strong> raw data X tries to remove whatever noise (ɛ) is present in X<br />
while retaining whatever signal S is present. (Note baseline removal baseline<br />
is handled separately - see section 3.3.5.) This is not to be confused with<br />
smoothing which removes high frequencies present in <strong>the</strong> data and retains<br />
low ones opposed to denoising which attempts to remove whatever noise is<br />
present. Denoising generally yields better results in subsequent steps <strong>of</strong> <strong>the</strong><br />
analysis workflow, since some general assumptions about smoothness can be<br />
taken. However, in practice most <strong>of</strong> <strong>the</strong> noise in MALDI-TOF spectra is<br />
indeed contained in <strong>the</strong> high-frequency component <strong>of</strong> a spectrum. This is<br />
mainly due to a number <strong>of</strong> factors, such as electrical inference, random ion<br />
motions, statistical fluctuation in <strong>the</strong> detector gain or chemical impurities (see<br />
e.g. (Shin et al., 2007)). There are several (heuristic) approaches <strong>for</strong> noise<br />
reduction in <strong>the</strong> literature such as moving average filters (Liu, Krishnapuram,<br />
Pratapa, Liao, Hartemink and Carin, 2003), Gaussian kernel filters (Wang,<br />
Howard, Campa, Patz and Fitzgerald, 2003), or PCA (Sta<strong>the</strong>ropoulos et al.,<br />
1999). However, most <strong>of</strong> <strong>the</strong>se (parametric) noise reduction approaches have<br />
been established based on empirical insights and <strong>the</strong> parameters need to be<br />
fine-tuned to make <strong>the</strong> method work properly - this is always case sensitive<br />
and time consuming.<br />
Our studies have shown that <strong>the</strong> approaches mentioned above are not very<br />
well suited <strong>for</strong> reducing noise in MALDI/SELDI-TOF spectra: <strong>for</strong> example,