08.02.2013 Views

New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.6. PEAK REGISTRATION (ALIGNMENT) 47<br />

PP being <strong>the</strong> set <strong>of</strong> peak properties. DA(Ci) is constructed as follows:<br />

Kernel Density Estimation (KDE) 15 (Schea<strong>the</strong>r, 2004) is per<strong>for</strong>med on A(xk) ∀ xk ∈<br />

Ci, interpolated and scaled to result in a probability density function.<br />

We have used this particular clustering approach since preliminary experiments<br />

have shown that this technique was able to best resolve <strong>the</strong> clusters in<br />

<strong>the</strong> |PP|-dimensional space we work in (typically |PP| = 6 . . . 8). Analyses<br />

have shown that <strong>the</strong> DA(Ci) are usually not unimodal thus favoring a clustering<br />

schema that does not create spherical clusters, such as k-means clustering<br />

(see e.g. (Deonia et al., 2007)).<br />

Determine Cluster Properties<br />

This step determines <strong>the</strong> properties (such as center, height, . . . ) <strong>of</strong><br />

<strong>the</strong> clusters found in <strong>the</strong> previous step. For each property a KDE k<br />

is per<strong>for</strong>med on <strong>the</strong> values <strong>of</strong> <strong>the</strong> single peaks <strong>the</strong> masterpeak consists<br />

<strong>of</strong>. If k is not similar to a normal distribution (tested by: Kolmogorov-<br />

Smirnov test, Lillie<strong>for</strong>s test, Anderson-Darling test, Ryan-Joiner test,<br />

Shapiro-Wilk test, Normal probability plot (rankit plot), Normality<br />

test, Jarque-Bera test) <strong>the</strong> masterpeak is flagged.<br />

Merge Similar Clusters<br />

Since <strong>the</strong> clustering in <strong>the</strong> previous step is not deterministic this step<br />

repairs clusters that have been divided into two or more sub-clusters.<br />

Two clusters are merged toge<strong>the</strong>r if all properties or all properties except <strong>the</strong><br />

center are similar (measured by <strong>the</strong> Jensen-Shannon divergence, see section<br />

3.7.1).<br />

Outcome: List <strong>of</strong> Masterpeaks<br />

The above procedure finally results in a list <strong>of</strong> masterpeaks. That is a list<br />

<strong>of</strong> peaks clustered by positions and properties (such as position or height)<br />

represented by <strong>the</strong> average values <strong>for</strong> <strong>the</strong>se properties.<br />

3.6.2 O<strong>the</strong>r Approaches<br />

An alternative approach to enable different spectra <strong>for</strong> comparison is to align<br />

<strong>the</strong>m and described below. This means, define some reference key peaks (e.g.<br />

based on known house-keeping molecules), find <strong>the</strong>se peaks in each spectrum<br />

and reorientate each spectrum towards <strong>the</strong>se peaks. Obvious problems are:<br />

� How to detect <strong>the</strong> key peaks ?<br />

� What if key peaks are missing ?<br />

� What if <strong>the</strong>re is a peak similar and next to a key peak ? Which is <strong>the</strong><br />

right one ?<br />

The main issue here is that during <strong>the</strong> reorientation process a spectrum<br />

gets partly distorted, because between two key peaks a linear adjustment takes<br />

place but <strong>the</strong> parts are not directly tied toge<strong>the</strong>r.<br />

This is shown exemplarily in Figure 3.6.15: spectrum (b) is reorientated<br />

on <strong>the</strong> basis <strong>of</strong> its detected key peaks towards <strong>the</strong> position <strong>of</strong> <strong>the</strong> reference key<br />

15 Following <strong>the</strong> Parzen Window approach with Gaussian Kernel.<br />

Figure 3.6.14: These<br />

are <strong>the</strong> parameters being<br />

determined <strong>for</strong> each peak.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!