New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
156CHAPTER 6. PROTEO<strong>MI</strong>CS.NET - PRODUCT-ORIENTED CASE STUDIES<br />
<strong>the</strong> plat<strong>for</strong>m, get <strong>the</strong> worker status in<strong>for</strong>mation and display it can be written<br />
in less than 20 lines as <strong>the</strong> following listing 6.1 shows:<br />
Listing 6.1: Minimal code <strong>for</strong> using <strong>the</strong> “get current worker()” web-service in Java.<br />
1 import org . apache . a x i s . c l i e n t . C a l l ;<br />
2 import org . apache . a x i s . c l i e n t . S e r v i c e ;<br />
3 import javax . xml . namespace .QName;<br />
4 import de . f u b e r l i n . mi . proteomics . p r o t e o m i c s n e t w e b s e r v i c e s . � ;<br />
5<br />
6 public class c l i e n t j a v a {<br />
7 public s t a t i c void main ( S t r i n g [ ] a r g s ) {<br />
8 try<br />
9 {<br />
10 S e r v i c e s i n f o L o c a t o r l o c = new S e r v i c e s i n f o L o c a t o r ( ) ;<br />
11 S e r v i c e s i n f o S o a p port = l o c . g e t s e r v i c e s i n f o S o a p ( ) ;<br />
12 System . out . p r i n t l n ( port . g e t c u r r e n t w o r k e r ( ) ) ;<br />
13 }<br />
14 catch ( Exception e )<br />
15 {System . out . p r i n t l n ( e . getMessage ( ) ) ; }<br />
16 }<br />
17 }<br />
The actual (synchronous) call to <strong>the</strong> web-service happens in line 12. This<br />
line could also contain far more complex calls, <strong>for</strong> example including objects<br />
as input and output parameters. Thanks to <strong>the</strong> trans<strong>for</strong>mation services (e.g.<br />
within <strong>the</strong> WSDL2Java tool) parameters send to and received from <strong>the</strong> webservice<br />
are mapped to Java data types.<br />
6.2.3 Integrating External Web Services on <strong>the</strong> Example <strong>of</strong><br />
Protein Identification<br />
Background<br />
One key issue in proteomics is to identify proteins and characterize <strong>the</strong>ir expressions<br />
in cells. In mass-spectrometry based proteomics this is done by<br />
ei<strong>the</strong>r peptide mass fingerprinting (PMF) <strong>of</strong> MS 1 spectra or by fur<strong>the</strong>r fragmenting<br />
single peptides producing MS 2 spectra where (ideally) <strong>the</strong> amino acid<br />
sequence can be derived from.<br />
The PMF approach (also known as protein fingerprinting) is an analytical<br />
technique <strong>for</strong> protein identification developed in <strong>the</strong> early 1990s. The<br />
basic idea is to digest an unknown protein <strong>of</strong> interest by a sequence specific<br />
protease (such as Trypsin). The set <strong>of</strong> resulting peptides (fragments) build a<br />
unique identifier (fingerprint) <strong>of</strong> <strong>the</strong> unknown protein based on this protease<br />
and subsequently compared to databases containing known fragmentation patterns<br />
<strong>for</strong> this protease. Obviously, <strong>the</strong> mass accuracy to which <strong>the</strong> peptides<br />
are measured plays a crucial role (Green et al., 1999).<br />
In MS 2 spectra analysis peptides <strong>of</strong> interest - identified during a MS 1<br />
run - are fragmented fur<strong>the</strong>r in a collision cell to produce tandem (MS/MS,<br />
MS 2 ) mass spectra. Since fragmentation (usually) happens at <strong>the</strong> backbone<br />
peptide bonds, by putting toge<strong>the</strong>r matching pieces (that result in <strong>the</strong> full<br />
peptide) and analyzing <strong>the</strong> point <strong>of</strong> rupture (in principle) determination <strong>of</strong><br />
<strong>the</strong> amino acids gets possible. This approach is called De Novo sequencing<br />
(see e.g. (Ma et al., 2003; Halligan et al., 2005) and references <strong>the</strong>rein). The<br />
second large class <strong>of</strong> algorithms <strong>for</strong> <strong>the</strong> identification problem is based on<br />
comparing <strong>the</strong> experimental spectrum against a database <strong>of</strong> <strong>the</strong>oretical spectra<br />
determined by in silico digestion and fragmentation <strong>of</strong> known proteins.