New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
3.3. PREPROCESSING 33<br />
Figure 3.3.5: Noisy signal (top) and its discrete wavelet trans<strong>for</strong>m (using a Symmlet-<br />
10) at level L = 6. The wavelet coefficients are shown as spikes <strong>for</strong> <strong>the</strong> levels (−6 . . .−<br />
10). The size and direction a spike (coefficient) represent its magnitude and sign.<br />
� Linear inverse wavelet trans<strong>for</strong>m: ˆ S = W −1 (Z)<br />
Obviously, selecting <strong>the</strong> threshold λ is key to successful denoising. A global,<br />
non-adaptive λ to remove white noise is proposed by (Donoho, 1995):<br />
λ = σ · � 2 · log(m), σ = MAD<br />
0.6745<br />
where m is <strong>the</strong> length <strong>of</strong> <strong>the</strong> signal and MAD an estimator <strong>of</strong> <strong>the</strong> noise level<br />
determined by <strong>the</strong> median absolute deviation in <strong>the</strong> first scale (<strong>the</strong> constant<br />
0.6745 makes <strong>the</strong> estimate unbiased <strong>for</strong> <strong>the</strong> normal distribution.).<br />
From <strong>the</strong> tested shrinkage schemas (VisuShrink universal, Minimax, Stein’s<br />
Unbiased Risk Estimate (SURE) and Minimum Description Length) <strong>the</strong> SURE<br />
approach (Stein, 1981; Donoho and Johnstone, 1995) was found to deliver <strong>the</strong><br />
best results. It determines a threshold <strong>for</strong> each resolution level (scale) by <strong>the</strong><br />
principle <strong>of</strong> minimizing <strong>the</strong> Stein Unbiased Estimate <strong>of</strong> Risk. This approach<br />
is smoothness-adaptive and has some interesting properties when used <strong>for</strong> <strong>the</strong><br />
MALDI-TOF spectra that can contain spiky as well as smooth peaks: if <strong>the</strong><br />
unknown signal S contains jumps, <strong>the</strong> reconstruction ˆ S does also and if S<br />
contains smooths regions ˆ S will be as smooth as <strong>the</strong> basis functions will allow.<br />
Figure 3.3.6 shows an example <strong>of</strong> this shrinkage applied to a usual MALDI-<br />
TOF spectrum.<br />
Experiments have shown, that de-noising <strong>the</strong> signal improves classification<br />
accuracy <strong>of</strong> spectra (at <strong>the</strong> final stage <strong>of</strong> our pipeline, see section 3.8.5) by<br />
4-5% on average, given all o<strong>the</strong>r parameters being equal. This is in good<br />
accordance with o<strong>the</strong>r studies, <strong>for</strong> example (Li et al., 2007) who achieve an<br />
improvement <strong>of</strong> 3-8% on similar SELDI-TOF serum data.<br />
3.3.5 Baseline Correction<br />
A baseline correction is per<strong>for</strong>med to remove this ra<strong>the</strong>r low-frequency noise<br />
from <strong>the</strong> spectrum. Following (Breen et al., 2000; Sauve and Speed, 2004;<br />
Gröpl et al., 2005) we use a morphological TopHat filter (Zeng et al., 2006).