New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
8 CHAPTER 1. INTRODUCTION AND SURVEY<br />
1. Detect contained signals (and filter out noise)<br />
2. Evaluate <strong>the</strong> signals<br />
3. Identify biomarkers (that is statistically significant differences between<br />
<strong>the</strong> groups)<br />
4. Build fingerprints and train classifiers using <strong>the</strong>se fingerprints<br />
5. Test per<strong>for</strong>mance (that is classification power) <strong>of</strong> <strong>the</strong> resulting classifiers<br />
in independent clinical studies<br />
A follow-up study <strong>the</strong>n <strong>of</strong>ten tries to determine <strong>the</strong> underlying molecules to<br />
link <strong>the</strong> fingerprints to e.g. metabolic pathways.<br />
Part I: Detecting Fingerprints<br />
There are many (still unsolved) problems associated with each <strong>of</strong> <strong>the</strong>se pipeline<br />
steps. For example, detecting even smallest but relevant signals in <strong>the</strong> raw<br />
data - which is a complex mix <strong>of</strong> <strong>the</strong> real biological signals and (random and<br />
systematic) noise introduced by <strong>the</strong> high throughput MS machines. In <strong>the</strong><br />
first part <strong>of</strong> this <strong>the</strong>sis we propose a solution to this problem: new statistic<br />
driven approach that allows to analyze noise and to identify signals below <strong>the</strong><br />
commonly used signal-to-noise threshold 2 (chapter 3).<br />
Additionall signals identified can be used in subsequent steps to build<br />
better patterns <strong>for</strong> proteomic fingerprinting analysis. We believe that this will<br />
foster identification <strong>of</strong> new biomarkers having not been detectable by most<br />
algorithms currently available.<br />
O<strong>the</strong>r very important issues are also addressed, such as preprocessing <strong>the</strong><br />
raw signals (e.g. to reduce systematic noise, see section 3.3), reliable mapping<br />
<strong>of</strong> detected signals across different spectra to allow comparison (section 3.6),<br />
building robust and compact fingerprints (section 3.8) and finally using <strong>the</strong>se<br />
fingerprints to classify unknown spectra (section 3.8.5).<br />
Part II: Medical Application<br />
The algorithms and methods developed in this <strong>the</strong>sis can be combined to<br />
an analysis pipeline <strong>for</strong> automated fingerprint detection from and analysis <strong>of</strong><br />
mass spectrometry data. This pipeline has been set up and equipped with a<br />
web-frontend to allow access <strong>for</strong> remote scientists (<strong>for</strong> example in hospitals).<br />
To prove <strong>the</strong> plat<strong>for</strong>m’s practical relevance it has been utilized in several<br />
clinical studies. We could successfully detect fingerprints <strong>for</strong> different cancer<br />
types (bladder, kidney, testicle, pancreas, colon and thyroid). Two <strong>of</strong> <strong>the</strong>se<br />
studies are presented in chapter 4.<br />
Experiments have shown, that <strong>the</strong> fingerprints found by our algorithms<br />
are missed by commercially available systems that are less sensitive than our<br />
approach.<br />
2 The thresholding method only regards signals if <strong>the</strong>ir height is above a certain value<br />
determined by a noise-estimation step. A common setting <strong>for</strong> <strong>the</strong> minimum signal height is<br />
three times <strong>the</strong> estimated noise level.