New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
3.6. PEAK REGISTRATION (ALIGNMENT) 47<br />
PP being <strong>the</strong> set <strong>of</strong> peak properties. DA(Ci) is constructed as follows:<br />
Kernel Density Estimation (KDE) 15 (Schea<strong>the</strong>r, 2004) is per<strong>for</strong>med on A(xk) ∀ xk ∈<br />
Ci, interpolated and scaled to result in a probability density function.<br />
We have used this particular clustering approach since preliminary experiments<br />
have shown that this technique was able to best resolve <strong>the</strong> clusters in<br />
<strong>the</strong> |PP|-dimensional space we work in (typically |PP| = 6 . . . 8). Analyses<br />
have shown that <strong>the</strong> DA(Ci) are usually not unimodal thus favoring a clustering<br />
schema that does not create spherical clusters, such as k-means clustering<br />
(see e.g. (Deonia et al., 2007)).<br />
Determine Cluster Properties<br />
This step determines <strong>the</strong> properties (such as center, height, . . . ) <strong>of</strong><br />
<strong>the</strong> clusters found in <strong>the</strong> previous step. For each property a KDE k<br />
is per<strong>for</strong>med on <strong>the</strong> values <strong>of</strong> <strong>the</strong> single peaks <strong>the</strong> masterpeak consists<br />
<strong>of</strong>. If k is not similar to a normal distribution (tested by: Kolmogorov-<br />
Smirnov test, Lillie<strong>for</strong>s test, Anderson-Darling test, Ryan-Joiner test,<br />
Shapiro-Wilk test, Normal probability plot (rankit plot), Normality<br />
test, Jarque-Bera test) <strong>the</strong> masterpeak is flagged.<br />
Merge Similar Clusters<br />
Since <strong>the</strong> clustering in <strong>the</strong> previous step is not deterministic this step<br />
repairs clusters that have been divided into two or more sub-clusters.<br />
Two clusters are merged toge<strong>the</strong>r if all properties or all properties except <strong>the</strong><br />
center are similar (measured by <strong>the</strong> Jensen-Shannon divergence, see section<br />
3.7.1).<br />
Outcome: List <strong>of</strong> Masterpeaks<br />
The above procedure finally results in a list <strong>of</strong> masterpeaks. That is a list<br />
<strong>of</strong> peaks clustered by positions and properties (such as position or height)<br />
represented by <strong>the</strong> average values <strong>for</strong> <strong>the</strong>se properties.<br />
3.6.2 O<strong>the</strong>r Approaches<br />
An alternative approach to enable different spectra <strong>for</strong> comparison is to align<br />
<strong>the</strong>m and described below. This means, define some reference key peaks (e.g.<br />
based on known house-keeping molecules), find <strong>the</strong>se peaks in each spectrum<br />
and reorientate each spectrum towards <strong>the</strong>se peaks. Obvious problems are:<br />
� How to detect <strong>the</strong> key peaks ?<br />
� What if key peaks are missing ?<br />
� What if <strong>the</strong>re is a peak similar and next to a key peak ? Which is <strong>the</strong><br />
right one ?<br />
The main issue here is that during <strong>the</strong> reorientation process a spectrum<br />
gets partly distorted, because between two key peaks a linear adjustment takes<br />
place but <strong>the</strong> parts are not directly tied toge<strong>the</strong>r.<br />
This is shown exemplarily in Figure 3.6.15: spectrum (b) is reorientated<br />
on <strong>the</strong> basis <strong>of</strong> its detected key peaks towards <strong>the</strong> position <strong>of</strong> <strong>the</strong> reference key<br />
15 Following <strong>the</strong> Parzen Window approach with Gaussian Kernel.<br />
Figure 3.6.14: These<br />
are <strong>the</strong> parameters being<br />
determined <strong>for</strong> each peak.