New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
3.5. PEAK DETECTION IN 2D MAPS 43<br />
� Create 2D (orthogonal) range tree (de Berg et al., 2000) which needs<br />
O(n log n) time and space <strong>for</strong> creation and storage 11 , respectively, and<br />
can answer range queries in O((log n) 2 + k) 12 (k being <strong>the</strong> number <strong>of</strong><br />
results). Since in a typical (medium resolution) map n ∼ 2.000.000, a<br />
query needs about (36 + k) comparisons in time and 12MB in space.<br />
Of course, <strong>the</strong> analysis could also be per<strong>for</strong>med directly by querying <strong>the</strong><br />
database but by using range trees this can be done on a remote worker<br />
(see section 5.4) without <strong>the</strong> need <strong>for</strong> using <strong>the</strong> potentially slow database<br />
connection.<br />
� For each spectrum St (sorted increasingly by retention time t): If an yet<br />
, t)<br />
unprocessed peak is found at position ( m<br />
z<br />
– Use this peak as seed and extend a bounding box around it.<br />
� The extension in m/z (x) direction is given by <strong>the</strong> length <strong>of</strong><br />
<strong>the</strong> isotope pattern + 10% (that is, if an isotope pattern spans<br />
from 1000 to 1010da <strong>the</strong> box would have <strong>the</strong> x-dimension: 999<br />
to 1011da).<br />
� In retention time (y) direction (successively (t − i) and (t +<br />
i)) <strong>the</strong> extension is done as follows: if Si is <strong>the</strong> current 1D<br />
spectrum, get <strong>the</strong> peaks <strong>of</strong> <strong>the</strong> next spectrum (Sj = Si±1)<br />
within <strong>the</strong> determined x range. If <strong>the</strong> peaks found are similar<br />
(see below) to <strong>the</strong> peaks <strong>of</strong> Si this step is repeated until no<br />
fur<strong>the</strong>r extension is possible.<br />
If <strong>the</strong> peaks found are not similar <strong>the</strong> next two spectra (Sk =<br />
Si±2 and Sl = Si±3) are checked as well to account <strong>for</strong> missing<br />
data. If <strong>the</strong> peaks <strong>of</strong> Sk or Sl are similar to <strong>the</strong> peak <strong>of</strong> Si<br />
<strong>the</strong> respective spectrum is set as <strong>the</strong> current one and this step<br />
started all over.<br />
– Fit 2D isotope pattern (mixture <strong>of</strong> 2D Gaussians) on peaks found<br />
within <strong>the</strong> bounding box. For each 2D Gaussian 13 we can compute<br />
<strong>the</strong> center <strong>of</strong> mass along <strong>the</strong> retention time axis and <strong>the</strong> m/z spread<br />
along <strong>the</strong> m/z axis <strong>for</strong> each gaussian from <strong>the</strong> 1D isotopic patterns<br />
(consisting <strong>of</strong> a mixture <strong>of</strong> Gaussians).<br />
– Mark used peaks as processed<br />
This procedure results in a list <strong>of</strong> 2D peaks parameterized by a mixture <strong>of</strong><br />
2D Gaussians.<br />
3.5.1 Similarity Measures <strong>for</strong> Curves<br />
The similarity (or distance) d(A, B) <strong>of</strong> two curves A, B (e.g. isotope patterns<br />
modeled as mixture <strong>of</strong> 1D Gaussians) is measured by <strong>the</strong> difference <strong>of</strong> <strong>the</strong>ir<br />
curvature based Turning Function ΘA (Arkin et al., 1991) (see (Veltkamp and<br />
Hagedoorn, 2000) <strong>for</strong> a discussion <strong>of</strong> differently similarity measures) that is<br />
well suited to capture differences in isotope pattern shape but is not sensitive<br />
to height differences.<br />
11 n log n<br />
Optimal storage <strong>of</strong> O( log log n<br />
) is possible by using R-trees.<br />
12<br />
Or O((log n) + k) in an improved fractional cascading version.<br />
13 (x−xo)<br />
f(x, y) = A · exp(−( 2<br />
2σ2 2<br />
(y−yo)<br />
) − (<br />
x<br />
2σ2 )) where A is <strong>the</strong> amplitude, x0, y0 <strong>the</strong> center<br />
y<br />
and σx, σy <strong>the</strong> x and y spreads.