New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
52 CHAPTER 3. MATHEMATICAL MODELING AND ALGORITHMS<br />
that a masterpeak is a set <strong>of</strong> peaks where each single peak stems from a distinct<br />
spectrum and represents a molecule (or peptide) <strong>of</strong> a certain weight. If we<br />
now find a masterpeak in two different groups at <strong>the</strong> same (m/z) position<br />
we have most likely found <strong>the</strong> same molecule (peptide) in <strong>the</strong> two groups.<br />
If <strong>the</strong>se masterpeaks differ significantly in height we could differentiate by<br />
<strong>the</strong>m. However, if <strong>the</strong> two masterpeaks have similar height we call <strong>the</strong>m nonin<strong>for</strong>mative<br />
and want <strong>the</strong>m to be tagged as such.<br />
Following this, we first have to match masterpeaks <strong>of</strong> two groups occurring<br />
at <strong>the</strong> same m/z position. Obviously, sometimes a masterpeak from group S1<br />
cannot be matched with a masterpeak from group S2 because this molecule<br />
(peptide) is just not present in S2. This process is called Masterpeak Alignment.<br />
We have implemented two different approaches to solve this problem, a<br />
naive version and an implementation as an Minimum Weight Maximum Cardinality<br />
Bipartite Matching <strong>for</strong>mulated as Linear Program and solved by <strong>the</strong><br />
Munkres (Hungarian) Algorithm which are compared in section 3.7.3.<br />
Approach 1: Naive Solution<br />
Algorithm 5 Masterpeak Assignment - Naive<br />
Require: Lists <strong>of</strong> masterpeaks MP1, MP2 <strong>of</strong> groups 1 & 2 respectively<br />
while MP1 has more elements do<br />
MP 1cur ← next element from MP1<br />
MP 2candidates : { s | m/z(s)−m/z(MP1)| ≤ 2}<br />
if |MP 2candidates| = 0 <strong>the</strong>n<br />
insert (MP 1cur, ∅) into LMP P AIRS<br />
else<br />
<strong>for</strong> all p ∈ MP 2candidates do<br />
insert (MP 1cur, p) into LMP P AIRS<br />
mark p (in list MP2) as processed<br />
end <strong>for</strong><br />
end if<br />
<strong>for</strong> all p ∈ MP2 do<br />
if p is not marked as processed <strong>the</strong>n<br />
insert (∅, p) into LMP P AIRS<br />
end if<br />
end <strong>for</strong><br />
end while<br />
return MP PAIRS: tuples <strong>of</strong> aligned pairs <strong>of</strong> masterpeaks.<br />
In this approach we simply check <strong>for</strong> masterpeaks in both groups that have<br />
similar m/z values. An obvious problem here is that peaks can be assigned<br />
more than once and no similarity measure is used to increase <strong>the</strong> quality <strong>of</strong><br />
<strong>the</strong> assignments.<br />
Approach 2: Bipartite Graph Matching<br />
In <strong>the</strong> second approach we re-<strong>for</strong>mulate <strong>the</strong> problem as a Minimum Weight<br />
Maximum Cardinality Bipartite Matching (assignment) problem. We are given<br />
<strong>the</strong> two sets <strong>of</strong> masterpeaks (MP1, MP2) which can be seen as vertices <strong>of</strong> a<br />
graph. This graph G = {V, E} is bipartite because <strong>the</strong> vertex set V is subdivided<br />
into two sets.