New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
6.2. CASE STUDIES 157<br />
MASCOT and SEQUEST (Perkins et al., 1999; J et al., 1994) are examples <strong>of</strong><br />
this approach which employ sophisticated statistical models to determine <strong>the</strong><br />
similarity <strong>of</strong> experimental and <strong>the</strong>oretical spectra.<br />
Both approaches suffer from <strong>the</strong> fact that peptide fragmentation is a complex<br />
(biochemical) process, so spectra generated from mass spectrometers<br />
are <strong>of</strong>ten significantly different from <strong>the</strong>ir <strong>the</strong>oretical counterparts. However,<br />
database approaches have been reported to work quite effectively when used<br />
with standard data. Un<strong>for</strong>tunately, per<strong>for</strong>mance deteriorates significantly in<br />
certain settings, such as high abundance <strong>of</strong> homologue proteins, lack <strong>of</strong> sequence<br />
data or present peptide modifications. In such scenarios de novo<br />
methods provide an invaluable alternative because <strong>the</strong>y infer protein sequences<br />
without using existing sequence data, while additionally accounting <strong>for</strong> possible<br />
peptide modifications.<br />
Available Projects<br />
As mentioned in <strong>the</strong> previous section building a protein/peptide identification<br />
algorithm is a complicated task and many sophisticated scientific and<br />
commercial projects are available <strong>for</strong> this. These are available as stand-alone<br />
programs, web-sites or web-services. The integration <strong>of</strong> stand-alone programs<br />
is described in section 6.2.5 so in this example we focus on <strong>the</strong> integration <strong>of</strong><br />
web-based services.<br />
Approach<br />
To integrate web-based services into <strong>the</strong> proteomics.net framework we extended<br />
<strong>the</strong> base QAD Grid worker (see section 5.4.1) by methods <strong>for</strong> using<br />
web-<strong>for</strong>ms (GET & POST), parsing HTML code and using web-services. With<br />
this new methods it becomes possible to use web pages (such as <strong>the</strong> Mascot services)<br />
and web-services (such as Emboss’ emowse service from <strong>the</strong> Helmholtz<br />
Open BioIn<strong>for</strong>matics Technology initiative, based on (Pappin et al., 1993)).<br />
To use <strong>the</strong>se services at <strong>the</strong> proteomics.net plat<strong>for</strong>m we implemented a<br />
protein ID worker that<br />
� takes <strong>the</strong> ID <strong>of</strong> a dataset (peaklist) available at <strong>the</strong> plat<strong>for</strong>m, <strong>the</strong> desired<br />
ID service and needed parameters as input,<br />
� send <strong>the</strong> request to <strong>the</strong> chosen service,<br />
� waits <strong>for</strong> <strong>the</strong> answer,<br />
� parses <strong>the</strong> result and converts it to <strong>the</strong> <strong>for</strong>mat used in <strong>the</strong> proteomics.net<br />
framework,<br />
� inserts <strong>the</strong> result into <strong>the</strong> database and links to <strong>the</strong> source data.<br />
This ID worker not only enables <strong>the</strong> integration <strong>of</strong> <strong>the</strong> protein identification<br />
service into <strong>the</strong> proteomics.net plat<strong>for</strong>m, but also allows execution <strong>of</strong> many<br />
queries in parallel and full integration into <strong>the</strong> workflow system.