10.07.2015 Views

NPC Progress Meeting 2012 - Netherlands Proteomics Centre

NPC Progress Meeting 2012 - Netherlands Proteomics Centre

NPC Progress Meeting 2012 - Netherlands Proteomics Centre

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

C:\What this research is about:De novo sequencingDatabases contain information about the genome sequencesof many organisms. Using the information from thesedatabases to identify proteins is usually a fast and reliablemethod. But although a large and growing amount of data isavailable in these databases, they are not complete. Not allorganisms are represented, for example the ostrich whichBas van Breukelen and colleagues from the BiomolecularMass Spectrometry and <strong>Proteomics</strong> Group in Utrecht haveanalysed. Furthermore, the data may contain mistakes,and mutations are not always included. “Therefore developinga database-free method is useful for protein identification,”explains Bas van Breukelen, <strong>NPC</strong> theme leader‘Bioinformatics in <strong>Proteomics</strong>’.Such a method, called de novo sequencing, has been in existencefor some time. But it has been shown to be accuratein only about 10% of the cases. Processes such as the loss ofa neutral molecule or the possibility of not only N-terminusfragmentation but also C-terminus fragmentation all createvery complex spectra. “I am convinced that one can reducethis complexity by carefully designing an experiment,” saysVan Breukelen.The researchers bought an ostrich steak at the butcher andprepared it using a novel enzyme called Lys-N. They then applieda combination of two different methods for fragmentationof the peptides (ETD and CID) and analysed these using<strong>NPC</strong> E4: Bioinformatics in <strong>Proteomics</strong>mass spectrometry. The result was less complex data whichcould be fed into a computer algorithm for de novo sequencing.A set of 2,744 new peptide sequences were identified.Since there is no database available for the ostrich proteome,Van Breukelen and colleagues had to devise anotherway to prove the accuracy of the results. They compared theset of peptides against the evolutionary Tree of Life, fromwhich the step on the evolutionary ladder of the organismcan be determined. The set of peptides was shown to belongto a bird, proving the accuracy of the analytical method.“We have doubled the accuracy of the de novo sequencing,”claims Van Breukelen. A patent has been filed on this method.“We are now investigating whether commercial partnersare interested.” In the meantime, higher resolution massspectrometers and a better visual interface will improve theapplicability of this database-free sequencing method.| 1327,067 CID spectraSCX fractions27,071 ETD spectraDe novo peptide sequence libraryNoise filtering, 11,183 spectraDe novo interpretation, 33,694,987 solutionsRemove redundancy, 5,765,043 unique solutions27,067 CID spectra 27,071 ETD spectra220,722 CID matches 217,017 ETD matches2,744 de novo peptides (0.7% FDR)Scoring through established algorithmFilter against predicted de novo scan numberAccept only ETD/CID matches with the samesequence and the same precursor, 8,890 matchesChoose top ranked ETD/CID solutionsFigure 1 | Schematic overview of the de novo pipeline. The entire processresulted in 8,890 de novo peptide solutions that represent an agreementbetween Mascot and the de novo algorithm as well as an agreement betweenETD and CID. Collapsing the data further led to 2,744 unique non-redundantpeptide sequences.lysine is followed by ETD fragmentation, spectra with singlec-ion series and hence easily interpretable sequence laddersare produced.This approach was shown to work very well in a proof ofconcept [4]. However, it also revealed that a large proportionof the fragmentation spectra still had gaps in their sequenceladders. These gaps in turn pose a challenge, as thousandsof possible amino acid combinations can complete a givengap, thereby creating an ambiguity. To tackle this issue, eachpeptide was sequenced by both ETD and CID (Collision InducedDissociation). The de novo algorithm subsequently generateda library of all possible peptide solutions to any sequence gapstogether with decoy (scrambled) sequences and fed these intothe database search software, Mascot [5].Complementary fragmentation techniques The processof the de novo pipeline is depicted in Figure 1. After Lys-Nprotein digestion and SCX enrichment of peptides containinga single N-terminal lysine, a nanoliter flow liquid chromatographyseparation and a mass spectrometric analysis areperformed. Each peptide is sequenced using the fragmentationtechniques ETD and CID. The resulting spectra are subsequentlyread into the de novo algorithm, which performs noise fil-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!