08.02.2013 Views

New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.5. PEAK DETECTION IN 2D MAPS 43<br />

� Create 2D (orthogonal) range tree (de Berg et al., 2000) which needs<br />

O(n log n) time and space <strong>for</strong> creation and storage 11 , respectively, and<br />

can answer range queries in O((log n) 2 + k) 12 (k being <strong>the</strong> number <strong>of</strong><br />

results). Since in a typical (medium resolution) map n ∼ 2.000.000, a<br />

query needs about (36 + k) comparisons in time and 12MB in space.<br />

Of course, <strong>the</strong> analysis could also be per<strong>for</strong>med directly by querying <strong>the</strong><br />

database but by using range trees this can be done on a remote worker<br />

(see section 5.4) without <strong>the</strong> need <strong>for</strong> using <strong>the</strong> potentially slow database<br />

connection.<br />

� For each spectrum St (sorted increasingly by retention time t): If an yet<br />

, t)<br />

unprocessed peak is found at position ( m<br />

z<br />

– Use this peak as seed and extend a bounding box around it.<br />

� The extension in m/z (x) direction is given by <strong>the</strong> length <strong>of</strong><br />

<strong>the</strong> isotope pattern + 10% (that is, if an isotope pattern spans<br />

from 1000 to 1010da <strong>the</strong> box would have <strong>the</strong> x-dimension: 999<br />

to 1011da).<br />

� In retention time (y) direction (successively (t − i) and (t +<br />

i)) <strong>the</strong> extension is done as follows: if Si is <strong>the</strong> current 1D<br />

spectrum, get <strong>the</strong> peaks <strong>of</strong> <strong>the</strong> next spectrum (Sj = Si±1)<br />

within <strong>the</strong> determined x range. If <strong>the</strong> peaks found are similar<br />

(see below) to <strong>the</strong> peaks <strong>of</strong> Si this step is repeated until no<br />

fur<strong>the</strong>r extension is possible.<br />

If <strong>the</strong> peaks found are not similar <strong>the</strong> next two spectra (Sk =<br />

Si±2 and Sl = Si±3) are checked as well to account <strong>for</strong> missing<br />

data. If <strong>the</strong> peaks <strong>of</strong> Sk or Sl are similar to <strong>the</strong> peak <strong>of</strong> Si<br />

<strong>the</strong> respective spectrum is set as <strong>the</strong> current one and this step<br />

started all over.<br />

– Fit 2D isotope pattern (mixture <strong>of</strong> 2D Gaussians) on peaks found<br />

within <strong>the</strong> bounding box. For each 2D Gaussian 13 we can compute<br />

<strong>the</strong> center <strong>of</strong> mass along <strong>the</strong> retention time axis and <strong>the</strong> m/z spread<br />

along <strong>the</strong> m/z axis <strong>for</strong> each gaussian from <strong>the</strong> 1D isotopic patterns<br />

(consisting <strong>of</strong> a mixture <strong>of</strong> Gaussians).<br />

– Mark used peaks as processed<br />

This procedure results in a list <strong>of</strong> 2D peaks parameterized by a mixture <strong>of</strong><br />

2D Gaussians.<br />

3.5.1 Similarity Measures <strong>for</strong> Curves<br />

The similarity (or distance) d(A, B) <strong>of</strong> two curves A, B (e.g. isotope patterns<br />

modeled as mixture <strong>of</strong> 1D Gaussians) is measured by <strong>the</strong> difference <strong>of</strong> <strong>the</strong>ir<br />

curvature based Turning Function ΘA (Arkin et al., 1991) (see (Veltkamp and<br />

Hagedoorn, 2000) <strong>for</strong> a discussion <strong>of</strong> differently similarity measures) that is<br />

well suited to capture differences in isotope pattern shape but is not sensitive<br />

to height differences.<br />

11 n log n<br />

Optimal storage <strong>of</strong> O( log log n<br />

) is possible by using R-trees.<br />

12<br />

Or O((log n) + k) in an improved fractional cascading version.<br />

13 (x−xo)<br />

f(x, y) = A · exp(−( 2<br />

2σ2 2<br />

(y−yo)<br />

) − (<br />

x<br />

2σ2 )) where A is <strong>the</strong> amplitude, x0, y0 <strong>the</strong> center<br />

y<br />

and σx, σy <strong>the</strong> x and y spreads.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!