08.02.2013 Views

New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.3. PREPROCESSING 33<br />

Figure 3.3.5: Noisy signal (top) and its discrete wavelet trans<strong>for</strong>m (using a Symmlet-<br />

10) at level L = 6. The wavelet coefficients are shown as spikes <strong>for</strong> <strong>the</strong> levels (−6 . . .−<br />

10). The size and direction a spike (coefficient) represent its magnitude and sign.<br />

� Linear inverse wavelet trans<strong>for</strong>m: ˆ S = W −1 (Z)<br />

Obviously, selecting <strong>the</strong> threshold λ is key to successful denoising. A global,<br />

non-adaptive λ to remove white noise is proposed by (Donoho, 1995):<br />

λ = σ · � 2 · log(m), σ = MAD<br />

0.6745<br />

where m is <strong>the</strong> length <strong>of</strong> <strong>the</strong> signal and MAD an estimator <strong>of</strong> <strong>the</strong> noise level<br />

determined by <strong>the</strong> median absolute deviation in <strong>the</strong> first scale (<strong>the</strong> constant<br />

0.6745 makes <strong>the</strong> estimate unbiased <strong>for</strong> <strong>the</strong> normal distribution.).<br />

From <strong>the</strong> tested shrinkage schemas (VisuShrink universal, Minimax, Stein’s<br />

Unbiased Risk Estimate (SURE) and Minimum Description Length) <strong>the</strong> SURE<br />

approach (Stein, 1981; Donoho and Johnstone, 1995) was found to deliver <strong>the</strong><br />

best results. It determines a threshold <strong>for</strong> each resolution level (scale) by <strong>the</strong><br />

principle <strong>of</strong> minimizing <strong>the</strong> Stein Unbiased Estimate <strong>of</strong> Risk. This approach<br />

is smoothness-adaptive and has some interesting properties when used <strong>for</strong> <strong>the</strong><br />

MALDI-TOF spectra that can contain spiky as well as smooth peaks: if <strong>the</strong><br />

unknown signal S contains jumps, <strong>the</strong> reconstruction ˆ S does also and if S<br />

contains smooths regions ˆ S will be as smooth as <strong>the</strong> basis functions will allow.<br />

Figure 3.3.6 shows an example <strong>of</strong> this shrinkage applied to a usual MALDI-<br />

TOF spectrum.<br />

Experiments have shown, that de-noising <strong>the</strong> signal improves classification<br />

accuracy <strong>of</strong> spectra (at <strong>the</strong> final stage <strong>of</strong> our pipeline, see section 3.8.5) by<br />

4-5% on average, given all o<strong>the</strong>r parameters being equal. This is in good<br />

accordance with o<strong>the</strong>r studies, <strong>for</strong> example (Li et al., 2007) who achieve an<br />

improvement <strong>of</strong> 3-8% on similar SELDI-TOF serum data.<br />

3.3.5 Baseline Correction<br />

A baseline correction is per<strong>for</strong>med to remove this ra<strong>the</strong>r low-frequency noise<br />

from <strong>the</strong> spectrum. Following (Breen et al., 2000; Sauve and Speed, 2004;<br />

Gröpl et al., 2005) we use a morphological TopHat filter (Zeng et al., 2006).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!