20.01.2015 Views

View - ResearchGate

View - ResearchGate

View - ResearchGate

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Clinical Proteomics


METHODS IN MOLECULAR BIOLOGY TM<br />

John M. Walker, SERIES EDITOR<br />

447. Alcohol: Methods and Protocols, edited by<br />

Laura E. Nagy, 2008<br />

446. Post-translational Modification of Proteins:<br />

Tools for Functional Proteomics, Second Edition,<br />

edited by Christoph Kannicht, 2008<br />

443. Molecular Modeling of Proteins, edited by<br />

Andreas Kukol, 2008<br />

439. Genomics Protocols: Second Edition, edited by<br />

Mike Starkey and Ramnanth Elaswarapu, 2008<br />

438. Neural Stem Cells: Methods and Protocols,<br />

Second Edition, edited by Leslie P. Weiner, 2008<br />

437. Drug Delivery Systems, edited by Kewal K. Jain,<br />

2008<br />

436. Avian Influenza Virus, edited by Erica Spackman,<br />

2008<br />

435. Chromosomal Mutagenesis, edited by Greg Davis<br />

and Kevin J. Kayser, 2008<br />

434. Gene Therapy Protocols: Volume 2: Design and<br />

Characterization of Gene Transfer Vectors edited<br />

by Joseph M. LeDoux, 2008<br />

433. Gene Therapy Protocols: Volume 1: Production<br />

and In Vivo Applications of Gene Transfer Vectors,<br />

edited by Joseph M. LeDoux, 2007<br />

432. Organelle Proteomics, edited by Delphine Pflieger<br />

and Jean Rossier, 2008<br />

431. Bacterial Pathogenesis: Methods and Protocols,<br />

edited by Frank DeLeo and Michael Otto, 2008<br />

430. Hematopoietic Stem Cell Protocols, edited by<br />

Kevin D. Bunting, 2008<br />

429. Molecular Beacons: Signalling Nucleic Acid<br />

Probes, Methods and Protocols, edited by Andreas<br />

Marx and Oliver Seitz, 2008<br />

428. Clinical Proteomics: Methods and Protocols,<br />

edited by Antonia Vlahou, 2008<br />

427. Plant Embryogenesis, edited by Maria Fernanda<br />

Suarez and Peter Bozhkov, 2008<br />

426. Structural Proteomics: High-Throughput Methods,<br />

edited by Bostjan Kobe, Mitchell Guss, and Huber<br />

Thomas, 2008<br />

425. 2D PAGE: Volume 2: Applications and Protocols,<br />

edited by Anton Posch, 2008<br />

424. 2D PAGE: Volume 1:, Sample Preparation and<br />

Pre-Fractionation, edited by Anton Posch, 2008<br />

423. Electroporation Protocols, edited by Shulin Li,<br />

2008<br />

422. Phylogenomics, edited by William J. Murphy, 2008<br />

421. Affinity Chromatography: Methods and<br />

Protocols, Second Edition, edited by Michael<br />

Zachariou, 2008<br />

420. Drosophila: Methods and Protocols, edited by<br />

Christian Dahmann, 2008<br />

419. Post-Transcriptional Gene Regulation, edited by<br />

Jeffrey Wilusz, 2008<br />

418. Avidin-Biotin Interactions: Methods and<br />

Applications, edited by Robert J. McMahon, 2008<br />

417. Tissue Engineering, Second Edition, edited by<br />

Hannsjörg Hauser and Martin Fussenegger, 2007<br />

416. Gene Essentiality: Protocols and Bioinformatics,<br />

edited by Svetlana Gerdes and Andrei L. Osterman,<br />

2008<br />

415. Innate Immunity, edited by Jonathan Ewbank and<br />

Eric Vivier, 2007<br />

414. Apoptosis in Cancer: Methods and Protocols,<br />

edited by Gil Mor and Ayesha Alvero, 2008<br />

413. Protein Structure Prediction, Second Edition,<br />

edited by Mohammed Zaki and Chris Bystroff, 2008<br />

412. Neutrophil Methods and Protocols, edited by<br />

Mark T. Quinn, Frank R. DeLeo, and Gary M.<br />

Bokoch, 2007<br />

411. Reporter Genes for Mammalian Systems, edited<br />

by Don Anson, 2007<br />

410. Environmental Genomics, edited by Cristofre<br />

C. Martin, 2007<br />

409. Immunoinformatics: Predicting Immunogenicity<br />

In Silico, edited by Darren R. Flower, 2007<br />

408. Gene Function Analysis, edited by Michael Ochs,<br />

2007<br />

407. Stem Cell Assays, edited by Vemuri C. Mohan,<br />

2007<br />

406. Plant Bioinformatics: Methods and Protocols,<br />

edited by David Edwards, 2007<br />

405. Telomerase Inhibition: Strategies and Protocols,<br />

edited by Lucy Andrews and Trygve O. Tollefsbol,<br />

2007<br />

404. Topics in Biostatistics, edited by Walter T.<br />

Ambrosius, 2007<br />

403. Patch-Clamp Methods and Protocols, edited by<br />

Peter Molnar and James J. Hickman 2007<br />

402. PCR Primer Design, edited by Anton Yuryev, 2007<br />

401. Neuroinformatics, edited by Chiquito J. Crasto,<br />

2007<br />

400. Methods in Membrane Lipids, edited by Alex<br />

Dopico, 2007<br />

399. Neuroprotection Methods and Protocols, edited<br />

by Tiziana Borsello, 2007<br />

398. Lipid Rafts, edited by Thomas J. McIntosh, 2007<br />

397. Hedgehog Signaling Protocols, edited by Jamila I.<br />

Horabin, 2007<br />

396. Comparative Genomics, Volume 2, edited by<br />

Nicholas H. Bergman, 2007<br />

395. Comparative Genomics, Volume 1, edited by<br />

Nicholas H. Bergman, 2007<br />

394. Salmonella: Methods and Protocols, edited by<br />

Heide Schatten and Abraham Eisenstark, 2007<br />

393. Plant Secondary Metabolites, edited by Harinder<br />

P. S. Makkar, P. Siddhuraju, and Klaus Becker,<br />

2007<br />

392. Molecular Motors: Methods and Protocols, edited<br />

by Ann O. Sperry, 2007<br />

391. MRSA Protocols, edited by Yinduo Ji, 2007<br />

390. Protein Targeting Protocols Second Edition,<br />

edited by Mark van der Giezen, 2007<br />

389. Pichia Protocols, Second Edition, edited by James<br />

M. Cregg, 2007<br />

388. Baculovirus and Insect Cell Expression<br />

Protocols, Second Edition, edited by David W.<br />

Murhammer, 2007<br />

387. Serial Analysis of Gene Expression (SAGE):<br />

Digital Gene Expression Profiling, edited by Kare<br />

Lehmann Nielsen, 2007<br />

386. Peptide Characterization and Application<br />

Protocols, edited by Gregg B. Fields, 2007<br />

385. Microchip-Based Assay Systems: Methods and<br />

Applications, edited by Pierre N. Floriano, 2007


METHODS IN MOLECULAR BIOLOGY TM<br />

Clinical Proteomics<br />

Methods and Protocols<br />

Edited by<br />

Antonia Vlahou<br />

Biomedical Research Foundation,<br />

Academy of Athens, Athens, Greece


Editor<br />

Antonia Vlahou<br />

Academy of Athens<br />

Biomedical Research Foundation<br />

Athens, Greece<br />

Athens 115 27<br />

e-mail: vlahoua@bioacademy.gr<br />

Series Editor<br />

John M. Walker<br />

School of Life Sciences<br />

University of Hertfordshire<br />

Hatfield, Herts., AL10 9AB<br />

UK<br />

ISBN: 978-1-58829-837-9 e-ISBN: 978-1-59745-117-8<br />

Library of Congress Control Number: 2007939413<br />

©2008 Humana Press, a part of Springer Science+Business Media, LLC<br />

All rights reserved. This work may not be translated or copied in whole or in part without the written<br />

permission of the publisher (Humana Press, 999 Riverview Drive, Suite 208, Totowa, NJ 07512 USA),<br />

except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form<br />

of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar<br />

methodology now known or hereafter developed is forbidden.<br />

The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are<br />

not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to<br />

proprietary rights.<br />

While the advice and information in this book are believed to be true and accurate at the date of going to<br />

press, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors<br />

or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the<br />

material contained herein.<br />

Printed on acid-free paper<br />

987654321<br />

springer.com


Preface<br />

Clinical proteomics has rapidly evolved over the past few years and is<br />

continuously growing as new methodologies and technologies emerge. In<br />

this volume, leading researchers in the field have contributed their stateof-the-art<br />

methodologies on protein profiling and identification of disease<br />

biomarkers in tissues, microdissected cells, and body fluids. Experimental<br />

approaches involving application of two-dimensional electrophoresis, multidimensional<br />

liquid chromatography, SELDI/MALDI mass spectrometry and<br />

protein arrays, as well as the bioinformatics and statistical tools pertinent to<br />

the analysis of proteomics data are described. As stated in the introductory<br />

chapter by Prof. Paik, the Vice President of the Human Proteome Organization,<br />

“clinical proteomics needs the integration of biochemistry, pathology,<br />

analytical technology, bioinformatics, and proteome informatics to develop<br />

highly sensitive diagnostic tools for routine clinical care in the future.” The<br />

multi-disciplinary character of clinical proteomics approaches is evident in the<br />

detailed step-by-step protocols described in this volume, which makes them<br />

of potential use to a wide range of researchers, including clinicians, molecular<br />

biologists, chemists, bioinformaticians, and computational biologists.<br />

Antonia Vlahou<br />

v


Acknowledgments<br />

The editor gratefully acknowledges all contributing authors for their<br />

collaboration, which made this project possible and brought it into fruition; the<br />

series editor, Prof. John Walker, whose help and guidance have been instrumental;<br />

Mr. Patrick Marton, Mr. David Casey, and the whole production team<br />

at Humana headed by the late Mr. Tom Laningan for making an excellent<br />

production of this book.<br />

vii


Contents<br />

Preface ...............................................................<br />

Acknowledgments ....................................................<br />

Contributors ..........................................................<br />

v<br />

vii<br />

xiii<br />

1. Overview and Introduction to Clinical Proteomics ................. 1<br />

Young-Ki Paik, Hoguen Kim, Eun-Young Lee,<br />

Min-Seok Kwon, and Sang Yun Cho<br />

Part I: Specimen Collection for Clinical Proteomics<br />

2. Specimen Collection and Handling: Standardization of Blood<br />

Sample Collection .............................................. 35<br />

Harald Tammen<br />

3. Tissue Sample Collection for Proteomics Analysis.................. 43<br />

Jose I. Diaz, Lisa H. Cazares, and O. John Semmes<br />

Part II: Clinical Proteomics by 2DE and Direct<br />

MALDI/SELDI MS Profiling<br />

4. Protein Profiling of Human Plasma Samples<br />

by Two-Dimensional Electrophoresis ........................... 57<br />

Sang Yun Cho, Eun-Young Lee, Hye-Young Kim, Min-Jung<br />

Kang, Hyoung-Joo Lee, Hoguen Kim, and Young-Ki Paik<br />

5. Analysis of Laser Capture Microdissected Cells<br />

by 2-Dimensional Gel Electrophoresis .......................... 77<br />

Daohai Zhang and Evelyn Siew-Chuan Koay<br />

6. Optimizing the Difference Gel Electrophoresis (DIGE)<br />

Technology .................................................... 93<br />

David B. Friedman and Kathryn S. Lilley<br />

7. MALDI/SELDI Protein Profiling of Serum<br />

for the Identification of Cancer Biomarkers......................125<br />

Lisa H. Cazares, Jose I. Diaz, Rick R. Drake, and O. John Semmes<br />

8. Urine Sample Preparation and Protein Profiling<br />

by Two-Dimensional Electrophoresis and Matrix-Assisted Laser<br />

Desorption Ionization Time of Flight Mass Spectroscopy ........ 141<br />

Panagiotis G. Zerefos and Antonia Vlahou<br />

ix


x<br />

Contents<br />

9. Combining Laser Capture Microdissection and Proteomics<br />

Techniques .................................................... 159<br />

Dana Mustafa, Johan M. Kros, and Theo Luider<br />

Part III: Clinical Proteomics by LC-MS Approaches<br />

10. Comparison of Protein Expression by Isotope-Coded Affinity<br />

Tag Labeling ................................................... 181<br />

Zhen Xiao and Timothy D. Veenstra<br />

11. Analysis of Microdissected Cells by Two-Dimensional<br />

LC-MS Approaches .............................................193<br />

Chen Li, Yi-Hong, Ye-Xiong Tan, Jian-Hua Ai,<br />

Hu Zhou, Su-Jun Li, Lei Zhang, Qi-Chang Xia,<br />

Jia-Rui Wu, Hong-Yang Wang, and Rong Zeng<br />

12. Label-Free LC-MS Method for the Identification of Biomarkers ..... 209<br />

Richard E. Higgs, Michael D. Knierman,<br />

Valentina Gelfanova, Jon P. Butler, and John E. Hale<br />

13. Analysis of the Extracellular Matrix and Secreted Vesicle<br />

Proteomes by Mass Spectrometry ............................... 231<br />

Zhen Xiao, Thomas P. Conrads, George R. Beck, Jr.,<br />

and Timothy D. Veenstra<br />

Part IV: Clinical Proteomics and Antibody Arrays<br />

14. Miniaturized Parallelized Sandwich Immunoassays ................ 247<br />

Hsin-Yun Hsu, Silke Wittemann, and Thomas O. Joos<br />

15. Dissecting Cancer Serum Protein Profiles Using<br />

Antibody Arrays ................................................263<br />

Marta Sanchez-Carbayo<br />

Part V: Statistics and Bioinformatics in Clinical<br />

Proteomics Data Analysis<br />

16. 2D-PAGE Maps Analysis .......................................... 291<br />

Emilio Marengo, Elisa Robotti, and Marco Bobba<br />

17. Finding the Significant Markers: Statistical Analysis<br />

of Proteomic Data..............................................327<br />

Sebastien Christian Carpentier, Bart Panis,<br />

Rony Swennen, and Jeroen Lammertyn<br />

18. Web-Based Tools for Protein Classification ........................ 349<br />

Costas D. Paliakasis, Ioannis Michalopoulos, and Sophia Kossida


Contents<br />

xi<br />

19. Open-Source Platform for the Analysis of Liquid<br />

Chromatography-Mass Spectrometry (LC-MS) Data .............. 369<br />

Matthew Fitzgibbon, Wendy Law, Damon May,<br />

Andrea Detter, and Martin McIntosh<br />

20. Pattern Recognition Approaches for Classifying Proteomic Mass<br />

Spectra of Biofluids ............................................ 383<br />

Ray L. Somorjai<br />

Index ..................................................................... 397


Contributors<br />

Jian-Hua Ai • Eastern Hepatobiliary Surgery Hospital, Shanghai, China<br />

George R. Beck, Jr • Division of Endocrinology, Metabolism and Lipids<br />

Emory University, School of Medicine, Atlanta, GA<br />

Marco Bobba • University of Eastern Piedmont, Department<br />

of Environmental and Life Sciences, Alessandria, Italy<br />

Jon P. Butler • Lilly Corporate Center, Indianapolis, IN<br />

Sebastien Christian Carpentier • Faculty of Bioscience Engineering,<br />

Division of Crop Biotechnics, K.U. Leuven, Leuven, Belgium<br />

Lisa H. Cazares • The George L. Wright Jr. Center for Biomedical<br />

Proteomics Eastern Virginia Medical School, Norfolk, VA<br />

Sang Yun Cho • Yonsei Biomedical Proteome Research Center, Department<br />

of Biochemistry, College of Sciences, Seoul, Korea<br />

Thomas P. Conrads • Laboratory of Proteomics and Analytical<br />

Technologies SAIC-Frederick, Inc., National Cancer Institute at Frederick,<br />

Frederick, MD<br />

Andrea Detter • Fred Hutchinson Cancer Research Center, Seattle, WA<br />

Jose I. Diaz • Cancer Therapy Research Center’s Institute for Drug<br />

Development, University of Texas, Health Science Center, San Antonio, TX<br />

Rick R. Drake • Eastern Virginia Medical School, Norfolk, VA<br />

Matthew Fitzgibbon • Fred Hutchinson Cancer Research Center,<br />

Seattle, WA<br />

David B. Friedman • Proteomics Laboratory, Mass Spectrometry Research<br />

Center, Department of Biochemistry, Vanderbilt University School<br />

of Medicine, Nashville, TN<br />

Valentina Gelfanova • Lilly Corporate Center, Indianapolis, IN<br />

John E. Hale • Lilly Corporate Center, Indianapolis, IN<br />

Richard E. Higgs • Lilly Corporate Center, Indianapolis, IN<br />

Yi-Hong • Eastern Hepatobiliary Surgery Hospital, Shanghai, China<br />

Hsin-Yun Hsu • Biochemistry Department NMI Natural and Medical<br />

Sciences Institute at the University of Tuebingen, Reutlingen, Germany<br />

Thomas O. Joos • Biochemistry Department, NMI Natural and Medical<br />

Sciences Institute at the University of Tuebingen, Reutlingen, Germany<br />

Min-Jung Kang • Yonsei Biomedical Proteome Research Center,<br />

Department of Biochemistry, College of Sciences, Seoul, Korea<br />

xiii


xiv<br />

Contributors<br />

Hoguen Kim • Department of Pathology, College of Medicine, Yonsei<br />

University, Seoul, Korea<br />

Hye-Young Kim • Yonsei Biomedical Proteome Research Center,<br />

Department of Biochemistry, College of Sciences, Seoul, Korea<br />

Michael D. Knierman • Lilly Corporate Center, Indianapolis, IN<br />

Evelyn Siew-Chuan Koay • Department of Pathology, Yong Loo Lin<br />

School of Medicine, National University of Singapore, and Molecular<br />

Diagnosis Center, Department of Laboratory Medicine. National University<br />

Hospital, Singapore<br />

Sophia Kossida • Division of Biotechnology, Biomedical Research<br />

Foundation, Academy of Athens, Athens, Greece<br />

Johan M. Kros • Department of Pathology, Josephine Nefkens Institute<br />

Erasmus Medical Center, Rotterdam, The Netherlands<br />

Min-Seok Kwon • Yonsei Biomedical Proteome Research Center,<br />

Department of Biochemistry, College of Sciences, Seoul, Korea<br />

Jeroen Lammertyn • Faculty of Bioscience Engineering, Division<br />

of Mechatronics, Biostatistics and Sensors, K.U. Leuven, Leuven, Belgium<br />

Wendy Law • Fred Hutchinson Cancer Research Center, Seattle, WA<br />

Eun-Young Lee • Yonsei Biomedical Proteome Research Center,<br />

Department of Biochemistry, College of Sciences, Seoul, Korea<br />

Hyoung-Joo Lee • Yonsei Biomedical Proteome Research Center,<br />

Department of Biochemistry, College of Sciences, Seoul, Korea<br />

Chen Li • Research Center for Proteome Analysis, Institute of Biochemistry<br />

and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese<br />

Academy of Sciences, Shanghai, China<br />

Su-Jun Li • Research Center for Proteome Analysis, Institute of<br />

Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences,<br />

Chinese Academy of Sciences, Shanghai, China<br />

Kathryn S. Lilley • Cambridge Centre for Proteomics, Department<br />

of Biochemistry, University of Cambridge, United Kingdom<br />

Theo Luider • Laboratories of Neuro-Oncology/Clinical and Cancer<br />

Proteomics, Josephine Nefkens Institute Erasmus Medical Center,<br />

Rotterdam, The Netherlands<br />

Emilio Marengo • Department of Environmental and Life Sciences,<br />

University of Eastern Piedmont, Alessandria, Italy<br />

Damon May • Fred Hutchinson Cancer Research Center, Seattle, WA<br />

Martin McIntosh • Fred Hutchinson Cancer Research Center, Seattle, WA<br />

Ioannis Michalopoulos • Biomedical Research Foundation, Academy<br />

of Athens, Athens, Greece<br />

Dana Mustafa • Department of Pathology, Josephine Nefkens Institute<br />

Erasmus Medical Center, Rotterdam, The Netherlands


Contributors<br />

xv<br />

Young-Ki Paik • Department of Biochemistry, Yonsei Proteome Research<br />

Center & Biomedical Proteome Research Center, Seoul, Korea<br />

Costas D. Paliakasis • Biomedical Research Foundation, Academy<br />

of Athens, Athens, Greece<br />

Bart Panis • Faculty of Bioscience Engineering, Division of Crop<br />

Biotechnics, K.U. Leuven, Leuven, Belgium<br />

Elisa Robotti • Department of Environmental and Life Sciences, University<br />

of Eastern Piedmont, Alessandria, Italy<br />

Marta Ṣanchez-Carbayo • Tumor Markers Group, Spanish National<br />

Cancer Center (CNI0), Madrid, Spain<br />

O. John Semmes • The George L. Wright Jr. Center for Biomedical<br />

Proteomics, Eastern Virginia Medical School, Norfolk, VA<br />

Ray L. Somorjai • Biomedical Informatics Institute for Biodiagnostics,<br />

National Research Council, Winnipeg, Manitoba, Canada<br />

Rony Swennen • Faculty of Bioscience Engineering, Division of Crop<br />

Biotechnics, K.U. Leuven, Leuven, Belgium<br />

Harald Tammen • Digilab BioVisioN GmbH, Hannover, Germany<br />

Ye-Xiong Tan • Eastern Hepatobiliary Surgery Hospital, Shanghai, China<br />

Timothy D. Veenstra • Laboratory of Proteomics and Analytical<br />

Technologies, SAIC-Frederick, Inc., National Cancer Institute at Frederick,<br />

Frederick, MD<br />

Antonia Vlahou • Division of Biotechnology, Biomedical Research<br />

Foundation, Academy of Athens, Athens, Greece<br />

Hong-Yang Wang • Eastern Hepatobiliary Surgery Hospital,<br />

Shanghai, China<br />

Silke Wittemann • Biochemistry Department, NMI Natural and Medical<br />

Sciences Institute at the University of Tuebingen, Reutlingen, Germany<br />

Jia-Rui Wu • Research Center for Proteome Analysis, Institute of<br />

Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences,<br />

Chinese Academy of Sciences, Shanghai, China<br />

Qi-Chang Xia • Research Center for Proteome Analysis, Institute of<br />

Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences,<br />

Chinese Academy of Sciences, Shanghai, China<br />

Zhen Xiao • Laboratory of Proteomics and Analytical Technologies,<br />

SAIC-Frederick, Inc., National Cancer Institute at Frederick,<br />

Frederick, MD<br />

Rong Zeng • Research Center for Proteome Analysis, Institute of<br />

Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences,<br />

Chinese Academy of Sciences, Shanghai, China<br />

Panagiotis G. Zerefos • Division of Biotechnology, Biomedical Research<br />

Foundation, Academy of Athens, Athens, Greece


xvi<br />

Contributors<br />

Daohai Zhang • Molecular Diagnosis Center Department of Laboratory<br />

Medicine, National University Hospital, Singapore and Department of<br />

Pathology, Yong Loo Lin School of Medicine, National University of<br />

Singapore, Singapore<br />

Lei Zhang • Research Center for Proteome Analysis, Institute of<br />

Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences,<br />

Chinese Academy of Sciences, Shanghai, China<br />

Hu Zhou • Research Center for Proteome Analysis, Institute of Biochemistry<br />

and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese<br />

Academy of Sciences, Shanghai, China


1<br />

Overview and Introduction to Clinical Proteomics<br />

Young-Ki Paik, Hoguen Kim, Eun-Young Lee, Min-Seok Kwon,<br />

and Sang Yun Cho<br />

Summary<br />

As the field of clinical proteomics progresses, discovery of disease biomarkers becomes<br />

paramount. However, the immediate challenges are to establish standard operating procedures<br />

for both clinical specimen handling and reduction of sample complexity and to<br />

increase the ability to detect proteins and peptides present in low amounts. The traditional<br />

concept of a disease biomarker is shifting toward a new paradigm, namely, that an<br />

ensemble of proteins or peptides would be more efficient than a single protein/peptide<br />

in the diagnosis of disease. Because clinical proteomics usually requires easy access to<br />

well-defined fresh clinical specimens (including morphologically consistent tissue and<br />

properly pretreated body fluids of sufficient quantity), biorepository systems need to be<br />

established. Here, we address these questions and emphasize the necessity of developing<br />

various microdissection techniques for tissue specimens, multidimensional fractionation<br />

for body fluids, and other related techniques (including bioinformatics), tools which could<br />

become integral parts of clinical proteomics for disease biomarker discovery.<br />

Key Words: biomarker; body fluids; clinical proteomics; translational proteomics;<br />

depletion; biorepository; multidimensional fractionation; specimen bank; biomarker panel.<br />

Abbreviations: CSF: Cerebrospinal Fluid, SILAC: Stable Isotope Labeling with<br />

Amino acids in Cell culture, FFE: Free Flow Electrophoresis, IMAC: Immobilized Metal<br />

Affinity Chromatography, 2DE: 2-dimensional Gel electrophoresis, CBB: Coomassie<br />

Brilliant Blue, SELDI: Surface-Enhanced Laser Desorption/Ionization, MALDI: Matrix-<br />

Assisted laser desorption/ionization, MDLC: Multi-dimensional Liquid Chromatography,<br />

LC: Liquid Chromatography, TOF: Time-of-Flight, CID: Collision-induced dissociation,<br />

ETD: Electron Transfer Dissociation, LIT: Linear Ion-Trap, FT: Fourier-Transform, Q:<br />

Quadrupole, ELISA; Enzyme-Linked Immunosorbent Assay, SISCAPA: Stable Isotope<br />

Standards with Capture by Anti-Peptide Antibody, AQUA: Absolute Quantitative<br />

From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />

Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />

1


2 Paik et al.<br />

Analysis. Commercial brands are also shown: MARS; Multiple Affinity Removal System,<br />

(Agilent, Palo Alto, CA, USA), Enchant TM : Enchant TM Multi-protein Affinity Separation<br />

Kit (Pall Life Sciences, Ann Arbor, MI, USA), Gradiflow TM : Gradiflow TM Separation (Life<br />

Bioprocess, Frenchs Forest, Australia), FFE TM : BD Free Flow Electrophoresis System<br />

(BD Diagnostics, Martinsried/Planegg, Germany), Zoom ® : Zoom ® Benchtop Proteomics<br />

System (Invitrogen Corporation, Carlsbad, CA, USA), Rotofor: Bio-Rad Rotofor ® Prep<br />

IEF Ccll (Bio-Rad, Hercules, CA, USA), PF2D: ProteomeLab TM PF2D Protein Fractionation<br />

System (Beckman Coulter, Inc., Fullerton, CA, USA), DIGE: Ettan TM DIGE System<br />

(GE Healthcare Bio-Sciences AB, Uppsala, Sweden), Deep Purple TM : Deep Purple TM Total<br />

Pprotein Stain (GE Healthcare Bio-Sciences AB, Uppsala, Sweden), ICAT TM : Isotopecoded<br />

affinity tags (Applied Biosystems, Foster City, CA, USA), iTRAQ TM : iTRAQ TM<br />

Reagents (Applied Biosystems, Foster City, CA, USA), Q-TRAP TM : (Applied Biosystems,<br />

Foster City, CA, USA).<br />

1. Overview and Scope of Clinical Proteomics<br />

Clinical proteomics is defined as comprehensive studies of qualitative and<br />

quantitative profiling of proteins (and peptides) present in clinical specimens<br />

such as body fluids and tissues. The comparison of specimens from healthy and<br />

diseased individuals may lead to the discovery of a disease biomarker (1). The<br />

biomarker serves as a molecular signature reflecting stages of disease before or<br />

after treatment and can also be used for prognostic purposes in monitoring the<br />

response to treatment (2). Clinical proteomics consists of a variety of experimental<br />

processes, which include the collection of well-phenotyped clinical<br />

specimens, analysis of proteins or peptides of interest, data interpretation, and<br />

validation of proteomics data in a clinical context (Fig. 1). After successful<br />

identification of a few disease biomarker candidates through extensive profiling,<br />

Fig. 1. Clinical and translational proteomics. The key components of experimental<br />

methods are included in each box.


Overview and Introduction to Clinical Proteomics 3<br />

translational proteomics involving validation with a cohort study follows. Even<br />

after proper identification and verification of a disease biomarker, it takes quite<br />

a long time to prove that this biomarker is applicable to clinical diagnosis or<br />

prognosis (3,4).<br />

There has been a remarkable increase in publication of clinical proteomics<br />

papers within a short period of time [more than 800 papers in 2006 (Fig. 2)],<br />

coinciding with the rapid growth of proteomics. Reflecting this trend in clinical<br />

proteomics, this chapter aims to present a review of core technologies that<br />

are used in the field of clinical proteomics with respect to sample specimen<br />

processing, protein separation platforms (e.g., gel-based system or liquid-based<br />

methods), quantitative labeling, mass spectrometry (MS), and proteome informatics<br />

tools. It is noteworthy that despite the advent of new technologies,<br />

there remain several bottlenecks in the proteomics field such as lack of dataset<br />

standardization, quantification of the proteins of interest, verification of protein<br />

or peptides identified, and an overall strategy for tackling biomarker postidentification.<br />

Thus, the pace of biomarker discovery, one of the key agendas of<br />

clinical proteomics, will depend on how well these obstacles or bottlenecks are<br />

resolved by technical advancement (4). The following sections address these<br />

issues in the context of clinical proteomics.<br />

Fig. 2. Recent trends in clinical proteomics publications. The distribution of the<br />

articles related to clinical proteomics listed in PubMed is shown here. The key words<br />

used for searching articles are as follows: query (clinical[All Fields] OR ((“biological<br />

markers”[TIAB] NOT Medline[SB]) OR “biological markers”[MeSH Terms] OR<br />

biomarker[Text Word])) AND (“proteomics”[MeSH Terms] OR proteomics[Text<br />

Word] OR proteomic[All Fields] OR “proteome”[MeSH Terms] OR proteome[Text<br />

Word]).


4 Paik et al.<br />

2. Sample Specimens and Processing Techniques Used for Clinical<br />

Proteomics<br />

2.1. General Considerations<br />

Because clinical proteomics rely heavily on the patient specimens, three<br />

important factors need to be considered before the selection and preparation of<br />

clinical specimens: (1) selection of the correct clinical samples according to the<br />

type of research, (2) isolation of the appropriate component from the clinical<br />

samples, and (3) establishment of optimal experimental conditions for each<br />

sample (5,6,7,8). For the selection of correct clinical samples, the relationship<br />

between clinical samples and the specific disease should also be considered.<br />

For example, although cancer tissue represents a specific cancer, several types<br />

of body fluids from patients may also have a relationship to the cancer. If<br />

the selected clinical samples specifically represent the disease, the next step<br />

is to evaluate what components are related to the specific disease. That is,<br />

tumor cells in cancerous tissues are surrounded by many types of stromal cells,<br />

inflammatory cells, and connective tissues that are directly related to changes<br />

in protein expression in the cancer. If the purpose of proteomic analysis is<br />

to identify characteristic changes of specific proteins in tumor cells, then the<br />

precise identification of tumor cell percentage that can be increased by tissue<br />

microdissection would appear to be necessary (5,6,7). As sample specimen<br />

conditions directly impact the results of biomarker discovery, well-defined<br />

clinical specimens should be used since the discovery of disease biomarkers is<br />

much easier when the samples have clear anatomical and pathophysiological<br />

definitions. Because clinical specimens are heterogeneous, sophisticated pathological<br />

discrimination is required for the isolation of specific diseased tissue or<br />

body fluids. Without the expertise of a pathologist at the earliest stage, it may<br />

be difficult to isolate a specifically defined specimen for clinical proteomics.<br />

Generally, clinical samples contain variable factors and components originating<br />

from the microenvironment of specific tissues. For instance, liver tissues usually<br />

contain a large amount of blood in the sinusoid and this amount is increased<br />

in tissues with dilated sinusoids (9). Lung tissues usually contain deposited<br />

exogenous materials and this amount is increased in heavy smokers (10). Note<br />

that the amount of blood present in isolated tissues may directly influence the<br />

relative proportion of proteins found in clinical specimens. Deposited materials<br />

and the other chemicals such as stain dye and fixatives used in the microdissection<br />

may also influence the experimental conditions (11). In the analysis of<br />

clinical samples, suitable buffer conditions, minimal lysis time, and high-yield<br />

protein precipitation are highly recommended. To avoid substantial variations<br />

between experiments using clinical specimens, a large set of specimens are<br />

also necessary because, unlike cultured cell lines, clinical specimens have high


Overview and Introduction to Clinical Proteomics 5<br />

component variability (12). More details on specific disease types are also<br />

described throughout this volume.<br />

2.2. Body Fluids<br />

Surveying the literature, there appears to be five to six different types of<br />

clinical specimens. Body fluids [e.g., plasma, urine, tear, cerebrospinal fluid,<br />

lymph, and ascites], tissues (e.g., liver, heart, muscle, brain, and lung), cells,<br />

bone, and hair have all been used for clinical proteomics (Table 1) (13,14,15,16,<br />

17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33). Each has its own merits<br />

and limitations for biomarker discovery via proteomic analysis. Among those<br />

sample specimens, the number of publications using body fluids has increased<br />

recently, perhaps because of their convenience and ease of use for noninvasive<br />

diagnosis. Since those proteins secreted in the body fluids during or after disease<br />

may reflect a broad range of pathophysiological conditions, much emphasis has<br />

been given to identification of prominent protein/peptide biomarkers that exhibit<br />

differential expression at different stages. In the literature, the terms “body<br />

fluids” and “biofluids” are being used interchangeably, although the former<br />

indicates a greater likelihood of being obtained directly from the patients, while<br />

the latter is applied more broadly, referring to liquid or liquid-like samples<br />

obtained from living organisms including model animals and plants. Throughout<br />

this chapter we will use “body fluids” for clarity.<br />

Given the large dynamic range of protein and peptide sources, plasma (a<br />

complex liquid interface between tissues) and extra cellular fluids may be the<br />

best body fluid to use for clinical proteomics and biomarker discovery (34,35,<br />

36,37,38). In addition to plasma, more than a dozen additional body fluids are<br />

currently used for biomarker discovery, ranging from urine to peritoneal fluids<br />

(Table 1). However, the biggest challenge in body fluids proteomics may be the<br />

multiple pretreatment processes including depletion of high-abundance proteins<br />

(in the case of plasma) (34,35,36) and/or their enrichment (in the case of urine)<br />

(15,39) prior to analysis (Table 1). Thus, the outcome of clinical proteomics<br />

may depend on proper sample processing since the quality of selection and<br />

handling of the most specific type of specimen will affect the overall pattern of<br />

profiling. Because the details of body fluid proteomics have been well described<br />

by Shen Hu et al. (38), we would like to focus on only a few essential points.<br />

First, standard measures need to be introduced to protect specimens from<br />

nonspecific proteolysis, lysis, and modification during collection and preparation<br />

(11). For the standardization of blood sample collection, Tammen<br />

emphasizes many useful considerations of preanalytical variables in plasma<br />

proteomics, which can be applied to processes involved with blood specimens<br />

[(40) and see Chapter 2]. The more specific problems involved in sample


Table 1<br />

Types of Biological Specimens Used in Clinical Proteomics<br />

Type Disease Reference Characteristics of the<br />

samples<br />

Fluid Secretions Plasma/serum (13,14) • Routinely accessible<br />

body fluids<br />

• Very important in the<br />

discovery of biomarkers<br />

Urine<br />

Nasal discharge Tears<br />

Saliva<br />

Amniotic-/cervical fluid<br />

Prostate cancer<br />

Seasonal allergic rhinitis Blepharitis and dry eye Oral and breast cancer Fetal aneuploidy<br />

and intra-amniotic<br />

inflammation<br />

(15) (16) (17,18) (19) (20,21) of diseases (systemic<br />

vs. organ specific/local)<br />

• Important for early<br />

detection, disease<br />

severity, prognosis,<br />

monitoring of response<br />

to therapy<br />

Proximal<br />

fluid<br />

Body<br />

cavity<br />

fluid<br />

Follicular fluid Recurrent spontaneous<br />

abortion<br />

Male infertility<br />

Breast cancer<br />

Brain tumor<br />

Seminal fluid<br />

Nipple aspirate<br />

fluid<br />

Cerebrospinal<br />

fluid<br />

Synovial fluid<br />

Ascites<br />

Bronchial lavage<br />

fluid<br />

Pleural fluid<br />

Peritoneal fluid<br />

Rheumatoid arthritis<br />

Ovarian cancer<br />

Chronic obstructive<br />

pulmonary disease,<br />

asthmatics and lung<br />

disease<br />

Lung cancer<br />

Ovarian cancer<br />

(22)<br />

(23)<br />

(24)<br />

(25)<br />

(26)<br />

(13)<br />

(27,28)<br />

(29)<br />

(14)<br />

• Can reflect disease<br />

perturbations in the<br />

organs or tissues from<br />

which they are secreted<br />

• Procedure of synovial<br />

biopsy is not very<br />

difficult<br />

Pretreatment required<br />

for proteomics<br />

• Considerations for<br />

sample adequacy<br />

– Storage<br />

– Hemolysis<br />

– Influence of<br />

anticoagulants<br />

–Consistent results<br />

• Consider whether to<br />

pool samples or analyze<br />

individual samples<br />

• Depletion of<br />

high-abundance proteins<br />

(Albumin consist of<br />

50% of plasma proteins)<br />

• Mucosa and salt<br />

have to be removed<br />

necessarily<br />

6


Tissue LCM or<br />

LMPC<br />

isolated<br />

Formalin<br />

fixed<br />

Paraffin<br />

embedded<br />

Any type of disease (30) • Very important for the<br />

development of novel<br />

in situ biomarkers<br />

• Immunofluorescence,<br />

immunocytochemistry,<br />

imaging mass<br />

spectrometry<br />

Cell Cell lines<br />

or<br />

primary<br />

tissue<br />

culture<br />

Any type of disease (31) • Very important in the<br />

discovery of biomarker<br />

candidates<br />

• Validation should be<br />

performed using<br />

primary tumor samples<br />

(e.g., immunohistologic<br />

methods, imaging MS)<br />

Bone Cartilage Rheumatoid arthritis (32) • Cartilage consists<br />

mainly of extracellular<br />

matrix, mostly made<br />

of collagens and<br />

proteoglycans<br />

Hair (33) • Over 300 proteins<br />

were found to constitute<br />

the insoluble complex<br />

formed by<br />

transglutaminase<br />

crosslinking<br />

• Considerations for<br />

sample adequacy<br />

• Integrity, degradation<br />

of protein<br />

• Contamination<br />

(microorganisms,<br />

extraneous material)<br />

• Desalting and removal<br />

of media component<br />

• Cetylpyridinium<br />

chloride effectively<br />

aggregate with<br />

proteoglycan<br />

• Need to sufficient<br />

extraction of protein<br />

from insoluble complex<br />

7


8 Paik et al.<br />

handling are also addressed by Rai et al. (41). Second, to increase the dynamic<br />

range of detection and reduce sample heterogeneity, pretreatments such as<br />

depletion of high-abundance proteins appear to be required (34,35,36). In<br />

addition, many pretreatment steps to remove high-abundance proteins may be<br />

required during initial sample processing. Multiple fractionations of clinical<br />

samples prior to major separation work would reduce the sample complexity.<br />

Note that coremoval of low-abundance proteins during this type of multiple<br />

depletion (36,42) and modification of proteins of interest during or after<br />

isolation (43) should be considered as well. For several problems encountered<br />

with specimen collection, Xiao et al. (Chapter 13) in this volume also describe<br />

different methods to isolate extra cellular matrix (ECM) and analyze the<br />

proteome of secreted vesicles. These methods will be useful for studying ECM<br />

and secreted vesicles in various samples ranging from the primary cultured<br />

cells to tissue specimens. Therefore, one must consider the best options for this<br />

process before doing the main experiment.<br />

2.3. Tissues and Other Samples<br />

Usually tissues are used as primary screening samples to find direct causes<br />

of disease from the lesion present in tissues of the corresponding organ, for<br />

example, liver tissue in hepatocellular carcinoma (HCC) (44,45). Tissues are<br />

widely used for clinical proteomics, although there are no standing operation<br />

procedures in specimen fractionation and the detection limit of current instrumentation<br />

remains borderline. As listed in Table 1, many cancer tissues can be<br />

prepared in different ways such as laser capture microdissection (LCM) (5,6),<br />

pressures catapulting techniques [laser microdissection and pressure catapulting<br />

(LMPC)] (30,46), and formalin-fixed paraffin-embedded sample preparation<br />

(11). Theses techniques are well described in Chapters 3, 5, 9, and 11 in this<br />

volume. It is desirable, however, that proteomics studies of disease tissues<br />

should also be coupled with parallel analysis of the corresponding body fluids.<br />

For example, for the study of cancer biomarkers, paired cancer tissue sets (tumor<br />

vs. nontumor) and the same patient’s plasma were used, which led to a more<br />

comprehensive analysis (47,48). Experiments on tissue samples may mostly be<br />

suitable for pathophysiological studies rather than biomarker discovery due to<br />

the complexity of the sample.<br />

In specimen processing for proteomics studies, there are usually several<br />

unwanted problems such as artifacts created during sample collection, processing,<br />

and storage. Other matters arise in the handling of patient information regarding<br />

sex, age, and race (49). To minimize those problems associated with systematic<br />

sample handling, it is plausible to establish a specimen bank (50,51,52). In fact,<br />

the collection of many clinical samples in a biorepository would have enormous


Overview and Introduction to Clinical Proteomics 9<br />

benefits for proteomic research. This enables the selection of homogeneous<br />

clinical samples according to the research purposes and isolation of specific<br />

components from clinical samples. Additionally, large scale collection of clinical<br />

specimens in a biorepository is essential for the validation of specific markers<br />

after biomarker candidate discovery. Ideally, the clinical samples stored in the<br />

biorepository should be (1) collected and stored immediately because dead cells<br />

and altered proteins affect proteomic analysis, (2) subjected to accurate quality<br />

control, and (3) catalogued by reliable and secure clinical data. The quality control<br />

of clinical samples includes trimming of specimens and confirmation of diagnosis<br />

by pathologists; information gained (such as the confirmation of tumor cell and<br />

stromal cell ratio, percentage of necrosis, percentage of fibrosis, proportion of<br />

infiltrated inflammatory cells, etc.) should be stored in a database of clinical<br />

samples. It is also essential to store clinical and follow-up data for each sample<br />

and each patient’s written informed consent form in the biorepository network.<br />

This clinical specimen banking network provides convenience, reduced budget,<br />

and reliability for researchers involved in clinical proteomic research (50,51,52).<br />

For representative tissue sample collection for proteomics studies, Diaz et al.<br />

(Chapter 3) address a practical experimental strategy for storage and handling of<br />

sample specimens that are used in surface-enhanced laser desorption/ionization<br />

(SELDI), 2D gel, and liquid chromatography (LC)-based proteomics. Emphasis<br />

should be given to the primary responsibility of pathologists in the whole<br />

process of tissue proteomics in addition to morphological analysis at the<br />

molecular level.<br />

3. Biomarker Discovery and Clinical Proteomics<br />

Given that one of the central issues of clinical proteomics is biomarker<br />

discovery and its application, a brief account of this subject is appropriate<br />

here. An excellent review of the whole arena of biomarker development can be<br />

found elsewhere (53,54,55). Until now, it has been generally accepted that a<br />

conventional concept of a disease biomarker would be a single protein/peptide<br />

with high specificity, which is usually present in low abundance, expressed in<br />

a disease in a stage-specific manner, and serve as a major fingerprint of the<br />

body’s response to drugs or other treatments. Although many examples of broad<br />

biomarkers for various diseases are known (56,57,58,59,60), identification of<br />

more specific and selective biomarkers is urgently needed. Accordingly, we<br />

may also need to change the current biomarker concept and eliminate the<br />

inherent bias toward individual disease biomarkers. Recently, a new idea has<br />

been introduced that an ensemble of different proteins would be more efficient<br />

than a single protein/peptide in the diagnosis of disease (61,62,63). To solve


10 Paik et al.<br />

this problem we propose a general strategy of clinical proteomics leading to<br />

disease biomarker discovery as outlined in Fig. 3.<br />

Since biomarker candidate proteins could come from many different cellular<br />

processes, they could be either in low abundance or high abundance, which<br />

would directly or indirectly reflect the physiological condition of the body.<br />

Perhaps they are present in different concentrations depending on the disease<br />

stage or tissue type. For example, common proteins such as Hsp 27 (64,<br />

65), 14-3-3 proteins (66,67), apoA-I (68,69), and serum amyloid precursor<br />

A (70) appear in most of disease samples from lung cancer, gastric cancer,<br />

pancreatic cancer, prostate cancer, neuroblastoma and, inflammation. A number<br />

of questions then arise: should they be treated as disease-specific or disease<br />

nonspecific proteins What would be the criterion to make this decision Is this<br />

due to the fact that the number and type of proteins secreted from a specific<br />

Fig. 3. The concept of the creation of a protein biomarker panel for a specific<br />

disease. Each white, gray, dark-gray, and black circle represents a putative protein<br />

biomarker of a specific disease at that clinical stage. A group of slash-lined circles<br />

symbolizes the biomarker panel of liver disease as an example.


Overview and Introduction to Clinical Proteomics 11<br />

physiological condition of many different types of diseases might be similar<br />

How one can distinguish one type of disease from another simply by looking<br />

at their protein profiles<br />

As outlined in Fig. 3, at the beginning of certain disease, signals at earlier<br />

stages may be limited to only a few easily counted molecules. As the disease<br />

progresses, more signal molecules might have been produced, resulting in mixed<br />

types of biomarkers representing multiple disease phenomena. Although this<br />

assumption seems to be oversimplified, more noise is created at a certain stage<br />

where it becomes more difficult to identify those molecules at the molecular<br />

level because of two reasons: (1) they are in amounts too small to be detected<br />

using the current technology and (2) it may be too premature for the molecules<br />

to be specific for a particular disease. Presumably, proteins appearing in stage 3<br />

or 4 may have higher specificity of a particular disease but the sensitivity might<br />

be low. It may be likely that this noise interferes with the signaling pathway of<br />

a certain disease, and we may end up having no decisive marker. To circumvent<br />

this problem, it may be desirable to identify a set of biomarker candidate<br />

proteins, termed a “biomarker panel,” which ideally contains potential candidate<br />

proteins or peptides that represent specific stages of the disease as a group.<br />

Given this panel, extensive validation processes may be sought using large<br />

group cohort. Analogous to this strategy, many biomarker candidates at stage 1<br />

can be included in the panel, which can have more specificity and sensitivity as<br />

compared to a single molecule biomarker. Using this kind of biomarker panel,<br />

one can use not only this molecule as diagnostic marker but also as a prognostic<br />

indicator in monitoring treatment effectiveness. For example, Linkov et al. (61)<br />

reported that both the sensitivity and specificity were improved up to 84.5 and<br />

98%, respectively, when they used a panel containing 25 multimarkers in early<br />

diagnosis of head and neck cancer (squamous cell cancer of the head and neck)<br />

(61). In the diagnosis of prostate cancer, specificity was increased from 5–15<br />

to 84–95% when they used a biomarker panel containing six marker proteins<br />

as compared to a single marker. In HCC, studies have been carried out on a<br />

biomarker panel consisting of a protein array that can be used as a diagnostic<br />

kit (62,63).<br />

A general strategy for biomarker discovery is outlined in Fig. 4. In typical<br />

clinical proteomics, work sample collection is the first step, followed by<br />

pretreatment of the sample in order to reduce sample complexity to enable<br />

searching for low-abundance proteins (e.g., disease biomarkers) using various<br />

fractionation tools. This multidimensional fractionation is well-described<br />

elsewhere (34,35,36), and depends on the properties and concentration of the<br />

sample. Typically the prefractionated samples go either to a two-dimensional<br />

electrophoresis (2DE) or LC-based proteomics separation system, followed by<br />

single or multiple steps of mass spectrometric analysis depending on the sample


12<br />

Fig. 4.


Overview and Introduction to Clinical Proteomics 13<br />

quantity and experimental goal. The data obtained from this series of analyses<br />

will be integrated into the proteome informatics system where protein/peptide<br />

identification, quantification, modification, and verification of peak list are<br />

carried out [(71) and also Chapter 19]. Usually this step becomes rate limiting<br />

since major profiling data are constructed and analyzed at this point. The<br />

clinical relevance of those proteins (and changes in their expression level) in<br />

a specific disease state is mostly determined, which eventually leads to identification<br />

of biomarker candidates. In addition, SELDI, molecular imaging and<br />

protein microarrays can also be applied before or after this step. Once major<br />

biomarker candidates are identified, those proteins are subjected to further<br />

verification via sophisticated analytical arrays and translational proteomics,<br />

which involves cohort studies, pre-evaluation, and a robust analytical system<br />

(4,72). Throughout the process of translational proteomics, one may be able to<br />

judge whether the identified panel or single proteins are suitable for biomarkers<br />

of a specific disease. A recent comprehensive review by Zolg (73) addressed<br />

several considerations in the biomarker development pipeline from discovery<br />

to validation. Three critical challenges within the pipeline are reduction of<br />

clinical sample complexity, the proof of principle of biomarker function, and<br />

the detection limit of unique proteins present in the samples.<br />

In the search for biomarker panels, reliable statistical tools and bioinformatics<br />

resources are needed, which are now available on the web (Table 2;<br />

see also Chapters 16 and 17). As the number of biomarker panel candidates<br />

increases, more cases are being examined, which require statistical learning<br />

methods. These methods include neural networks, genetic algorithms, k-means<br />

◭<br />

Fig. 4. A typical experimental strategy for clinical proteomics and translational<br />

proteomics. In clinical proteomics research, various experimental techniques<br />

are included: specimen collection, prefractionation, 2DE, Non2DE (liquid-based<br />

separation), mass spectrometry, informatics, and others. The course of each section as<br />

marked (square, circle in different color) is determined by the investigators, depending<br />

on the experimental goal. At the bottom, experimental procedures for the verification<br />

and validation of biomarker candidates are schematically outlined leading to clinical<br />

screening and applications. The squares indicate the separation system based on the<br />

specific characteristics of proteins and general prefractionation system. The open circles<br />

and open triangle represent analytical modules at the protein and peptide level, respectively.<br />

The arrow and junction points indicate an option of each selection. Bottom parts<br />

indicate verification procedure employing multiple reaction monitoring and quantitative<br />

mass analysis. Those biomarker candidates identified from typical clinical proteomics<br />

would be subject to translational proteomics for validation where a large scale cohort<br />

study and evaluation would then proceed.


14 Paik et al.<br />

nearest-neighbor analysis, euclidean distance-based nonlinear methods, fuzzy<br />

pattern matching, selforganizing mapping, and support vector machines<br />

(74,75,76,77,78). They are very useful for classification of proteins according<br />

to the specific disease state (see also Chapters 16 and 20). Once biomarker<br />

candidates are identified, it is necessary to predict in silico the function of<br />

these proteins and validate them in the context of clinical application. Table 3<br />

provides web resources, which can be used for clinical data management, in<br />

silico functional annotation (see Chapter 18), prediction, and identification of<br />

modified forms of proteins. Thus, by combining experimental methods (Fig. 4)<br />

and informatics tools (Tables 2 and 3), one is able to obtain a set of biomarker<br />

candidate proteins (panel) that would be further used for validation through<br />

translational proteomics (Fig. 1).<br />

4. Introduction of the Experimental Strategy Described<br />

in This Volume<br />

For protein profiling and identification, proteomics platform technologies<br />

are moving forward in many areas not only in clinical proteomics but also in<br />

the general biological field. In this section, the leading scientists in the field<br />

of proteomics outline core techniques and their application to the studies of<br />

clinical proteomics. For example, in plasma proteome analysis, it is necessary<br />

to deplete high-abundance proteins using various techniques such as multidimensional<br />

fractionation by immunoaffinity column, gel permeation, and beads<br />

(Fig. 4). Cho et al. (Chapter 4) addresses this in relation to 2D gel analysis of<br />

plasma wherein the technical details of sample preparation, gel electrophoresis,<br />

and quantification of proteins on the gel are described. Zhang and Koay<br />

(Chapter 5) describe the methods of 2D gel analysis for cells prepared by<br />

LCM. They describe the application of LCM in dissecting tumor cells in<br />

breast cancer for macromolecular extraction and 2D gels. This can be used<br />

for preparation of samples from paraffin-embedded tissue blocks in microdissecting<br />

the cells of interest. Further to this procedure, Mustafa et al. (Chapter 9)<br />

review the application of LCM for proteomics analysis and demonstrate that<br />

combining LCM and MS would facilitate identification of specific proteins<br />

for each sample type. For urine sample analysis, Zerefos et al. (Chapter 8)<br />

provide simple protocols for protein analysis by 2D gel or direct matrix-assisted<br />

laser desorption/ionization-time-of-flight mass spectrometry. These techniques<br />

include protein enrichment through protein precipitation and ultrafiltration<br />

means. Combining these methods with the above profiling technologies allows<br />

reproducible and sensitive analysis of one of the most significant and complex<br />

biological samples (77).


Overview and Introduction to Clinical Proteomics 15<br />

Table 2<br />

Clinical Proteomics Initiatives and Resources<br />

Institute<br />

CPTI<br />

ABRF<br />

PPI<br />

EDRN<br />

Web resources<br />

ExPASy<br />

NCBI<br />

CPRMap<br />

Database<br />

MedGene<br />

Details<br />

National Cancer Institute’s Clinical<br />

Proteomics Technologies, initiative for<br />

cancer<br />

The Association of Biomolecular<br />

Resource Facilities, an international<br />

society dedicated to advancing core and<br />

research biotechnology laboratories<br />

through research, communication, and<br />

education<br />

Plasma Proteome Institute, the PPI is<br />

working to facilitate clinical adoption of<br />

advanced diagnostic tests using proteins<br />

in plasma and serum<br />

The Early Detection Research Network,<br />

the EDRN provide up-to-date<br />

information on biomarker research<br />

through this website and scientific<br />

publications<br />

Expert Protein Analysis System,<br />

proteomics related information and<br />

database<br />

National Center for Biotechnology<br />

Information, the protein entries in the<br />

Entrez search and retrieval system have<br />

been compiled from a variety of sources,<br />

including SwissProt, PIR, PRF, PDB,<br />

and translations from annotated coding<br />

regions in GenBank and RefSeq<br />

Clinical Proteomics Research Map,<br />

updated research article for disease and<br />

clinical proteomics<br />

MedGene can make a list of human<br />

genes associated with a particular human<br />

disease in ranking order<br />

Websites<br />

http://proteomics.cancer.<br />

gov<br />

http://www.abrf.org/<br />

http://www.plasmaprote<br />

ome.org/plasmaframes.<br />

htm<br />

http://edrn.nci.nih.gov<br />

http://www.expasy.org/<br />

http://www.ncbi.nlm.<br />

nih.gov/entrez/query.<br />

fcgidb = Protein&<br />

itool = toolbar<br />

http://www.cprmap.com/<br />

http://hipseq.med.harv<br />

ard.edu/MEDGENE


16 Paik et al.<br />

Table 3<br />

Available Bioinformatic Resources for the Analysis of Proteomics Data<br />

Name Description Website URL PMID<br />

Clinical proteome data management system<br />

Proteus<br />

LIMS for proteomics<br />

pipeline<br />

CPAS<br />

LIMS for identification<br />

and quantification using<br />

by LC-MS/MS data<br />

Systems biology A management system for<br />

experiment analysis collecting, storing,<br />

management and accessing data<br />

system<br />

produced by microarray,<br />

proteomics, and<br />

immunohistochemistry<br />

GPM database Open source system for<br />

analyzing, validating,<br />

and storing protein<br />

identification data<br />

SpectrumMill MS/MS data analysis and<br />

management system<br />

http://www.<br />

genologics.com<br />

http://www.<br />

sbeams.org/<br />

http://www.<br />

thegpm.org/<br />

http://www.chem.<br />

agilent.com/<br />

16396501<br />

16756676<br />

15595733<br />

Phosphorylation<br />

Group-based<br />

phosphorylation<br />

scoring method<br />

KinasePhos<br />

NetPhos<br />

NetPhosK<br />

Prediction of<br />

kinase-specific<br />

phosphorylation sites<br />

A web tool for identifying<br />

protein kinase-specific<br />

phosphorylation sites<br />

using by hidden Markov<br />

model<br />

Sequence and<br />

structure-based prediction<br />

of eukaryotic protein<br />

phosphorylation sites<br />

Prediction of<br />

post-translational<br />

glycosylation and<br />

phosphorylation of<br />

proteins from the amino<br />

acid sequence<br />

http://973-<br />

proteinweb.ustc.<br />

edu.cn/gps/<br />

gps_web/<br />

http://kinasePhos.<br />

mbc.nctu.edu.tw<br />

http://www.cbs.<br />

dtu.dk/services/<br />

NetPhos/<br />

http://www.cbs.dtu.<br />

dk/services/<br />

NetPhosK/<br />

15980451<br />

15980458<br />

10600390<br />

15174133


Overview and Introduction to Clinical Proteomics 17<br />

PredPhospho<br />

PREDIKIN<br />

Prosite<br />

Scansite<br />

Phospho.ELM<br />

Human protein<br />

reference database<br />

(HPRD)<br />

PhosphoSite<br />

Glycosylation<br />

NetOGlyc 2.0<br />

DictyOGlyc 1.1<br />

YinOYang 1.2<br />

NetNGlyc 1.0<br />

GlycoMod<br />

Prediction of phosphorylation<br />

sites using support vector<br />

machine<br />

A prediction of substrates for<br />

serine/threonine protein<br />

kinases based on the primary<br />

sequence of a protein kinase<br />

catalytic domain<br />

A prediction of substrates<br />

for protein kinases-based<br />

conserved motif search<br />

Prediction of PK-specific<br />

phosphorylation site with<br />

Bayesian decision theory<br />

A database of experimentally<br />

verified phosphorylation sites<br />

in eukaryotic proteins<br />

A database of known<br />

kinase/phosphatase substrate as<br />

well as binding motifs that are<br />

curated from the published<br />

literature<br />

A bioinformatics resource<br />

dedicated to physiological<br />

protein phosphorylation<br />

Predicts O-glycosylation sites<br />

in mucin-type proteins<br />

Predicts O-GlcNAc sites in<br />

eukaryotic proteins<br />

Predicts O-GlcNAc sites in<br />

eukaryotic proteins<br />

Predicting N-glycosylation<br />

sites<br />

Web software for prediction of<br />

the possible oligosaccharide<br />

structures in glycoproteins<br />

from their experimentally<br />

determined masses<br />

http://pred.ngri.<br />

re.kr/Pred<br />

Phospho.htm<br />

http://florey.biosci.<br />

uq.edu.au/kinsub/<br />

home.htm<br />

http://kr.expasy.<br />

org/prosite<br />

http://scansite.<br />

mit.edu<br />

http://phospho.elm.<br />

eu.org/<br />

http://www.hprd.<br />

org/PhosphoMotif_<br />

finder<br />

http://www.<br />

phosphosite.org/<br />

Login.jsp<br />

http://www.cbs.<br />

dtu.dk/services/<br />

NetOGlyc/<br />

http://www.cbs.<br />

dtu.dk/services/<br />

DictyOGlyc/<br />

http://www.cbs.<br />

dtu.dk/services/<br />

YinOYang/<br />

http://www.cbs.dtu.<br />

dk/services/<br />

NetNGlyc/<br />

http://www.expasy.<br />

ch/tools/glycomod/<br />

15231530<br />

16445868<br />

17237102<br />

16549034<br />

15212693<br />

15174125<br />

9557871<br />

10521537<br />

16316981<br />

11680880<br />

(Continued)


18 Paik et al.<br />

Table 3<br />

(Continued)<br />

Name Description Website URL PMID<br />

Glyco-fragment<br />

GlycoSearchMS<br />

GlycosidIQ<br />

Saccharide<br />

topology<br />

analysis tool<br />

GlycoX<br />

MODi<br />

SWEET-DB<br />

A web tool to support<br />

the interpretation of<br />

mass spectra of complex<br />

carbohydrates<br />

Compares each peak<br />

of a measured mass<br />

spectrum with the calculated<br />

fragments of all structures<br />

contained in the SweetDB<br />

Based on the matching of<br />

experimental MS2 data with<br />

the theoretical fragmentation<br />

of glycan structures in<br />

GlycoSuiteDB<br />

A web-based computational<br />

program that can quickly<br />

extract sequence information<br />

from a set of MSn spectra<br />

for an oligosaccharide of up<br />

to 10 residues<br />

To determine simultaneously<br />

the glycosylation sites<br />

and oligosaccharide<br />

heterogeneity of<br />

glycoproteins using<br />

MATLAB<br />

A web server for identifying<br />

multiple post-translational<br />

peptide modifications from<br />

tandem mass spectra<br />

An attempt to create<br />

annotated data collections<br />

for carbohydrates<br />

Protein–protein interaction<br />

Munich<br />

The database of mammalian<br />

information protein–protein interactions<br />

center for protein<br />

sequence’s MPPI<br />

http://www.dkfz.<br />

de/spec/projekte/<br />

fragments/<br />

14625865<br />

http://www.dkfz. 15215392<br />

de/spec/glycosciences.<br />

de/sweetdb/ms/<br />

https://tmat. 15174134<br />

proteomesystems.<br />

com/glyco/glycosuite/<br />

glycodb<br />

http://www.<br />

unimod.org<br />

http://www.dkfz.de/<br />

spec2/sweetdb/<br />

10857602<br />

17022651<br />

16845006<br />

11752350<br />

http://mips.gsf.de 16381839


Overview and Introduction to Clinical Proteomics 19<br />

Database of<br />

interacting proteins<br />

Molecular<br />

interaction network<br />

database<br />

Protein–protein<br />

interactions of<br />

cancer proteins<br />

IntAct<br />

Biomolecular<br />

interaction network<br />

database<br />

A database that documents<br />

experimentally determined<br />

protein–protein interactions<br />

A database of storing, in<br />

a structured format,<br />

information about<br />

molecular interactions by<br />

extracting experimental<br />

details from work<br />

published in peer-reviewed<br />

journals<br />

Predicts interactions, which<br />

are derived from homology<br />

with experimentally known<br />

protein–protein interactions<br />

from various species<br />

IntAct provides a freely<br />

available, open source<br />

database system and<br />

analysis tools for protein<br />

interaction data<br />

A database designed to<br />

store full descriptions of<br />

interactions, molecular<br />

complexes and pathways<br />

http://dip.doembi.ecla.edu/<br />

http://mint.bio.<br />

uniroma2.it/mint<br />

http://bmm.<br />

cancerresearchuk.<br />

org/˜pip<br />

http://www.ebi.<br />

ac.uk/intact/<br />

Metabolic and<br />

signal pathway<br />

BioCarta A pathway database http://www.<br />

biocarta.com<br />

KEGG<br />

Cancer cell map<br />

HPRD<br />

A pathway database with<br />

genomical, chemical, and<br />

biological network<br />

information<br />

The cancer cell map is a<br />

selected set of human<br />

cancer focused pathways<br />

A database with<br />

data pertaining<br />

to post-translational<br />

modifications,<br />

protein–protein<br />

interactions, tissue<br />

expression,<br />

11752321<br />

17135203<br />

16398927<br />

17145710<br />

http://www.bind.ca 12519993<br />

http://www.<br />

genome.jp/kegg<br />

http://cancer.<br />

cellmap.org/cellmap/<br />

http://www.<br />

hprd.org/<br />

16381885<br />

(Continued)


20 Paik et al.<br />

Table 3<br />

(Continued)<br />

Name Description Website URL PMID<br />

subcellular localization,<br />

and enzyme–substrate<br />

relationships<br />

Proteomic data resource<br />

The cancer cell A database of clinical data<br />

map<br />

from SELDI-TOF<br />

Proteomics<br />

identifications<br />

database<br />

PeptideAtlas<br />

Disease resource<br />

Online<br />

mendelian<br />

inheritance in<br />

man<br />

GeneCards<br />

Cancer gene<br />

census<br />

A database of protein and<br />

peptide identifications that<br />

have been described in the<br />

scientific literature<br />

A multiorganism, publicly<br />

accessible compendium of<br />

peptides identified in a<br />

large set of tandem mass<br />

spectrometry proteomics<br />

experiments<br />

A database of human genes<br />

and genetic disorders<br />

An integrated database of<br />

human genes that includes<br />

automatically mined<br />

genomic, proteomic, and<br />

transcriptomic information<br />

A catalogue those genes for<br />

which mutations have been<br />

causally implicated in cancer<br />

http://home.ccr.<br />

cancer.gov/ncifda<br />

proteomics/<br />

ppatterns.asp<br />

http://www.ebi.<br />

ac.uk/pride/<br />

http://www.<br />

peptideatlas.org<br />

http://www.ncbi.nlm.<br />

nih.gov/entrez/query.<br />

fcgidb = OMIM<br />

http://www.genecards.<br />

org/index.shtml<br />

http://www.sanger.<br />

ac.uk/genetics/CGP/<br />

Census/<br />

16381953<br />

16381952<br />

17170002<br />

15608261<br />

14993899<br />

Two-dimensional electrophoresis is perhaps the most popular start-up tool<br />

for proteome analysis. For clinical proteomics, 2DE has been the traditional<br />

workhorse of proteomics used for the analysis of different clinical specimens<br />

ranging from plasma to urine (Table 1). Quantification problems in 2DE are now<br />

solved by employing fluorescent dyes (cy3 and cy5), which allow normalization


Overview and Introduction to Clinical Proteomics 21<br />

of data obtained from two different clinical specimens (79). Freedman and<br />

Lilley (Chapter 6) present general optimization conditions for differential in gel<br />

electrophoresis (DIGE) in the quantitative analysis of clinical samples. They<br />

address the usefulness of differentially labeling dyes (Cy2, Cy3, and Cy5).<br />

The essence of any DIGE system is to minimize any potential human errors<br />

in the process of identification and quantification of proteins spotted in a 2D<br />

gel (79). The difficulties in 2D map analysis are introduced by Marengo et al.<br />

(Chapter 16). They describe methods for comparing protein spots using image<br />

analysis technology and related informatics tools to minimize variations between<br />

measurements of spot volume, a key to successful 2D map construction.<br />

There are many variations of LC in protein profiling, including mass detection<br />

methods, column types, data mining through search engines, mass accuracy,<br />

and running conditions (80,81,82). These are all related to quantification of<br />

proteins or peptides in the sample, one of the major bottlenecks in proteomics<br />

(83,84,85,86,87). Among the several techniques are isotope-coded affinity tags<br />

(ICAT), mass-coded affinity tagging, and nonisotope labeled methods. Xiao and<br />

Veenstra (Chapter 10) present the application of ICAT in the course of COX-2<br />

inhibitor regulated proteins in a colon cancer cell line. With emphasis on sample<br />

preparation, they provide details on ICAT procedures for quantitative proteomics<br />

(88). In addition to this approach, Li et al. (Chapter 11) employ a strategy,<br />

which combines LCM techniques for sample preparation of HCC and cleavable<br />

isotope-coded affinity tags in order to identify those markers quantitatively.<br />

However, it should be mentioned here that some other measures are needed to<br />

increase the efficiency of ICAT since it has drawbacks in the efficiency of sample<br />

recovery during or after labeling steps (87). A label-free serum quantification<br />

method has been recently introduced (48) (See Chapter 12 by Higgs et al.).<br />

The use of antibody arrays in clinical proteomics has increased recently in the<br />

context of high-throughput detection of cancer specimens where the identities<br />

of the proteins of interest are known (89,90). The evaluation of antibody crossreactivity<br />

and specificity is very crucial in these assays. This matter is addressed<br />

by Sanchez-Carbayo (Chapter 15), where technical aspects and application of<br />

planar antibody arrays in the quantification of serum proteins is described as<br />

well as by Hsu et al. (Chapter 14) where the development and use of beadbased<br />

miniaturized multiplexed sandwich immunoassays for focused protein<br />

profiling in various body fluids is provided. The latter method using beadbased<br />

protein arrays or suspension microarray allows the simultaneous analysis<br />

of a variety of parameters within a single experiment. With the versatility of<br />

suspension microarray in the analysis of proteins of interest present in different<br />

types of body fluids ranging from serum to synovial fluids, this multiplexed<br />

protein profiling technology described by Hsu et al. (Chapter 14) seems to<br />

hold a great promise in clinical proteomics. Similarly, in combination with


22 Paik et al.<br />

tissue microarrays technology (91) it would also be possible to perform parallel<br />

molecular profiling of clinical samples together with immunohistochemistry,<br />

fluorescence in situ hybridization, or RNA in situ hybridization. SELDI is<br />

another arena of high-throughput profiling of clinical samples in the course<br />

of disease marker discovery [(92,93), Chapter 7]. It is expected that profiling<br />

approaches in proteomics, such as SELDI-MS, will be frequently used in disease<br />

marker discovery, but only if the proper identification technologies coupled<br />

with SELDI are improved.<br />

During the course of biomarker discovery, large data sets are usually<br />

generated and deposited in a coordinated fashion (Tables 2 and 3) (94,95).<br />

Indeed, statistical analysis of 2DE proteomics, which produce several hundred<br />

protein spots, is complex. To circumvent some inconsistency in 2D gel<br />

proteomics data, Friedman and Lilley (Chapter 6) and Carpentier et al. (Chapter<br />

17) point out available statistical tools and suggest case-specific guidelines for<br />

2D gel spot analysis. Fitzgibbon et al. (Chapter 19) describe an open source<br />

platform for LC-MS spectra where the msInspector program is used to lower<br />

false positives and guide normalization of the dataset. It is also demonstrated<br />

that msInspect can analyze data from quantitative studies with and without<br />

isotopic labels. Paliakasis et al. (Chapter 18) introduce web-based tools for<br />

protein classification, which lead to prediction of potential protein function<br />

and family clustering of related proteins. They provide some guidelines to<br />

classification of protein data into more meaningful families. Finally, Somorjai<br />

(Chapter 20) addresses important filtering criteria for the application of protein<br />

pattern recognition to biomarker discovery using statistical tools.<br />

5. Concluding Remarks<br />

Although there are several bottlenecks in clinical proteomics (such as lack<br />

of standardization of sample specimen process, quantification, and overall<br />

strategy for tackling post-identification of biomarkers), we believe that the<br />

field holds great promise in biomarker discovery. The success of clinical<br />

proteomics depends on the availability and selection of well-phenotyped<br />

specimens, reduction of sample complexity, development of good informatics<br />

tools, and efficient data management. Therefore, sample handling techniques<br />

including microdissection for tissue sample, multidimensional fractionation for<br />

body fluids, and pretreatment of other clinical specimens (e.g., urine, tears, and<br />

cells) should be developed in this context. Since there is no gold standard for<br />

sample collection and handling, one needs to find the best options available for<br />

sample processing without damage. In addition, establishment of a biorepository<br />

system would systematically minimize some artifacts and variation between<br />

samples during or after identification of biomarkers.


Overview and Introduction to Clinical Proteomics 23<br />

It is now generally accepted that an ensemble (or panel) of different proteins<br />

would be more efficient than a single protein/peptide in the diagnosis of disease,<br />

an idea which is poised to replace the conventional concept of a biomarker.<br />

As a high-throughput way of protein profiling, the use of antibody arrays<br />

in clinical proteomics has recently increased in regard to detection of cancer<br />

specimens. However, in the use of antibody arrays to profile serum autoantibodies,<br />

issues of cross-reactivity and specificity have to be resolved. Although<br />

not covered here due to space limitations, with the advent of proteomics<br />

techniques one can further analyze a network of protein–protein interaction<br />

as well as post-translational modifications of those proteins involved in a<br />

specific disease (Table 3). It is now highly recommended that common reagents<br />

such as antibodies and standard proteins, which are very useful for spiking<br />

purposes, quantification work, and sensitivity normalization of one machine to<br />

another be used in worldwide efforts like human proteome organization plasma<br />

proteome project (96,97). Finally, clinical proteomics needs the integration of<br />

biochemistry, pathology, analytical technology, bioinformatics, and proteome<br />

informatics to develop highly sensitive diagnostic tools for routine clinical care<br />

in the future (71,98).<br />

Acknowledgments<br />

This study was supported by a grant from the Korea Health 21 R&D project,<br />

Ministry of Health & Welfare, Republic of Korea (A030003 to YKP).<br />

References<br />

1. Etzioni, R., Urban, N., Ramsey, S., McIntosh, M., Schwartz, S., Reid, B., Radich, J.,<br />

Anderson, G., and Hartwell, L. (2003) The case for early detection. Nat. Rev.<br />

Cancer 3, 1–10.<br />

2. Ludwig, J. A. and Weinstein, J. N. (2005) Biomarkers in cancer staging, prognosis<br />

and treatment selection. Nat. Rev. Cancer 5, 845–856.<br />

3. Xiao, Z., Prieto, D., Conrads, T. P., Veenstra, T. D., and Issaq, H. J. (2005)<br />

Proteomic patterns: their potential for disease diagnosis. Mol. Cell Endocrinol.<br />

230, 95–106.<br />

4. Rifai, N., Gillette, M. A., and Carr, S. A. (2006) Protein biomarker discovery<br />

and validation: the long and uncertain path to clinical utility. Nat. Biotechnol. 24,<br />

97–983.<br />

5. Emmert-Buck, M. R., Bonner, R. F., Smith, P. D., Chuaqui, R. F., Zhuang, Z.,<br />

Goldstein, S. R., Weiss, R. A., and Liotta, L. A. (1996) Laser capture microdissection.<br />

Science 274, 998–1001.<br />

6. Gillespie, J. W., Ahram, M., Best, C. J., Swalwell, J. I., Krizman, D. B.,<br />

Petricoin, E. F., Liotta, L. A., and Emmert-Buck, M. R. (2001) The role of tissue<br />

microdissection in cancer research. Cancer J. 7, 32–39.


24 Paik et al.<br />

7. Craven, R. A. and Banks, R. E. (2002) Use of laser capture microdissection to<br />

selectively obtain distinct populations of cells for proteomic analysis. Methods<br />

Enzymol. 356, 33–49.<br />

8. Vincourt, J. B., Lionneton, F., Kratassiouk, G., Guillemin, F., Netter, P.,<br />

Mainard, D., and Magdalou, J. (2006) Establishment of a reliable method for direct<br />

proteome characterization of human articular cartilage. Mol. Cell Proteomics 5,<br />

1984–1995.<br />

9. Platt, M. S., Agamanolis, D. P., Krill, C. E. Jr., Boeckman, C., Potter, J. L.,<br />

Robinson, H., and Lloyd, J. (1983) Occult hepatic sinusoid tumor of infancy<br />

simulating neuroblastoma. Cancer 52, 1183–1189.<br />

10. Mahadevia, P. J., Fleisher, L. A., Frick, K. D., Eng, J., Goodman, S. N., and<br />

Powe, N. R. (2003) Lung cancer screening with helical computed tomography<br />

in older adult smokers: a decision and cost-effectiveness analysis. JAMA 289,<br />

313–322.<br />

11. Hood, B. L., Darfler, M. M., Guiel, T. G., Furusato, B., Lucas, D. A.,<br />

Ringeisen, B. R., Sesterhenn, I. A., Conrads, T. P., Veenstra, T. D., and Krizman,<br />

D. B. (2005) Proteomic analysis of formalin-fixed prostate cancer tissue. Mol. Cell<br />

Proteomics 4, 1741–1753.<br />

12. Alaiya, A., Al-Mohanna, M., and Linder, S. (2005) Clinical cancer proteomics:<br />

promises and pitfalls. J. Proteome Res. 4, 1213–1222.<br />

13. Gericke, B., Raila, J., Sehouli, J., Haebel, S., Konsgen, D., Mustea, A., and<br />

Schweigert, F. J. (2005) Microheterogeneity of transthyretin in serum and ascitic<br />

fluid of ovarian cancer patients. BMC Cancer 17, 133–141.<br />

14. Swisher, E. M., Wollan, M., Mahtani, S. M., Willner, J. B., Garcia, R., Goff, B. A.,<br />

and King, M. C. (2005) Tumor-specific p53 sequences in blood and peritoneal fluid<br />

of women with epithelial ovarian cancer. Am. J. Obstet. Gynecol. 193, 662–667.<br />

15. Pisitkun, T., Johnstone, R., and Knepper, M. A. (2006) Discovery of urinary<br />

biomarkers. Mol. Cell Proteomics 5, 1760–1771.<br />

16. Ghafouri, B., Irander, K., Lindbom, J., Tagesson, C., and Lindahl, M. (2006)<br />

Comparative proteomics of nasal fluid in seasonal allergic rhinitis. J. Proteome<br />

Res. 5, 330–338.<br />

17. Koo, B. S., Lee, D. Y., Ha, H. S., Kim, J. C., and Kim, C. W. (2005) Comparative<br />

analysis of the tear protein expression in blepharitis patients using two-dimensional<br />

electrophoresis. J. Proteome Res. 4, 719–724.<br />

18. Grus, F. H., Podust, V. N., Bruns, K., Lackner, K., Fu, S., Dalmasso, E. A.,<br />

Wirthlin, A., and Pfeiffer, N. (2005) SELDI-TOF-MS ProteinChip array profiling<br />

of tears from patients with dry eye. Invest. Ophthalmol. Vis. Sci. 46, 863–876.<br />

19. Amado, F. M., Vitorino, R. M., Domingues, P. M., Lobo, M. J., and Duarte, J. A.<br />

(2005) Analysis of the human saliva proteome. Expert Rev. Proteomics 2, 521–539.<br />

20. Wang, T. H., Chang, Y. L., Peng, H. H., Wang, S. T., Lu, H. W., Teng, S. H.,<br />

Chang, S. D., and Wang, H. S. (2005) Rapid detection of fetal aneuploidy using<br />

proteomics approaches on amniotic fluid supernatant. Prenat. Diagn. 25, 559–566.<br />

21. Ruetschi, U., Rosen, A., Karlsson, G., Zetterberg, H., Rymo, L., Hagberg,<br />

H., and Jacobsson, B. (2005) Proteomic analysis using protein chips to detect


Overview and Introduction to Clinical Proteomics 25<br />

biomarkers in cervical and amniotic fluid in women with intra-amniotic inflammation.<br />

J. Proteome Res. 4, 2236–2242.<br />

22. Kim, Y. S., Kim, M. S., Lee, S. H., Choi, B. C., Lim, J. M., Cha, K. Y., and<br />

Baek, K. H. (2006) Proteomic analysis of recurrent spontaneous abortion: identification<br />

of an inadequately expressed set of proteins in human follicular fluid.<br />

Proteomics 6, 3445–3454.<br />

23. Pilch, B. and Mann, M. (2006) Large-scale and high-confidence proteomic analysis<br />

of human seminal plasma. Genome Biol. 7, R40<br />

24. Varnum, S. M., Covington, C. C., Woodbury, R. L., Petritis, K., Kangas, L. J.,<br />

Abdullah, M. S., Pounds, J. G., Smith, R. D., and Zangar, R. C. (2003) Proteomic<br />

characterization of nipple aspirate fluid: identification of potential biomarkers of<br />

breast cancer. Breast Cancer Res. Treat. 80, 87–97.<br />

25. Zheng, P. P., Luider, T. M., Pieters, R., Avezaat, C. J., van den Bent, M. J., Sillevis<br />

Smitt, P. A., and Kros, J. M. (2003) Identification of tumor-related proteins by<br />

proteomic analysis of cerebrospinal fluid from patients with primary brain tumors.<br />

J. Neuropathol. Exp. Neurol. 62, 855–862.<br />

26. Gibson, D. S., Blelock, S., Brockbank, S., Curry, J., Healy, A., McAllister, C.,<br />

and Rooney, M. E. (2006) Proteomic analysis of recurrent joint inflammation in<br />

juvenile idiopathic arthritis. J. Proteome Res. 5, 1988–1995.<br />

27. Merkel, D., Rist, W., Seither, P., Weith, A., and Lenter, M. C. (2005)<br />

Proteomic study of human bronchoalveolar lavage fluids from smokers with<br />

chronic obstructive pulmonary disease by combining surface-enhanced laser<br />

desorption/ionization-mass spectrometry profiling with mass spectrometric protein<br />

identification. Proteomics 5, 2972–2980.<br />

28. Wu, J., Kobayashi, M., Sousa, E. A., Liu, W., Cai, J., Goldman, S. J., Dorner, A. J.,<br />

Projan, S. J., Kavuru, M. S., Qiu, Y., and Thomassen, M. J. (2005) Differential<br />

proteomic analysis of bronchoalveolar lavage fluid in asthmatics following<br />

segmental antigen challenge. Mol. Cell Proteomics 4, 1251–1264.<br />

29. Tyan, Y. C., Wu, H. Y., Lai, W. W., Su, W. C., and Liao, P. C. (2005) Proteomic<br />

profiling of human pleural effusion using two-dimensional nano liquid chromatography<br />

tandem mass spectrometry. J. Proteome Res. 4, 1274–1286.<br />

30. Khalil, A. A. and James, P. (2007) Biomarker discovery: a proteomic approach for<br />

brain cancer profiling. Cancer Sci. 98, 201–213.<br />

31. Khodavirdi, A. C., Song, Z., Yang, S., Zhong, C., Wang, S., Wu, H., Pritchard, C.,<br />

Nelson, P. S., and Roy-Burman, P. (2006) Increased expression of osteopontin<br />

contributes to the progression of prostate cancer. Cancer Res. 66, 883–888.<br />

32. Vincourt, J. B., Lionneton, F., Kratassiouk, G., Guillemin, F., Netter, P., Mainard, D.,<br />

and Magdalou, J. (2006) Establishment of a reliable method for direct proteome<br />

characterization of human articular cartilage. Mol. Cell Proteomics 5, 1984–1995.<br />

33. Lee, Y. J., Rice, R. H., and Lee, Y. M. (2006) Proteome analysis of human<br />

hair shaft: from protein identification to post-translational modification. Mol. Cell<br />

Proteomics 5, 789–800.<br />

34. Cho, S. Y., Lee, E. Y., Lee, J. S., Kim, H. Y., Park, J. M., Kwon, M. S., Park, Y. K.,<br />

Lee, H. J., Kang, M. J., Kim, J. Y., Yoo, J. S., Park, S. J., Cho, J. W., Kim, H. S., and


26 Paik et al.<br />

Paik, Y. K. (2005) Efficient prefractionation of low-abundance proteins in human<br />

plasma and construction of a two-dimensional map. Proteomics 5, 3386–3396.<br />

35. Lathrop, J. T., Hayes, T. K., Carrick, K., and Hammond, D. J. (2005) Rarity gives<br />

a charm: evaluation of trace proteins in plasma and serum. Expert Rev. Proteomics<br />

2, 393–406.<br />

36. Lee, H. J., Lee, E. Y., Kwon, M. S., and Paik, Y. K. (2006) Biomarker discovery<br />

from the plasma proteome using multidimensional fractionation proteomics. Curr.<br />

Opin. Chem. Biol. 10, 42–49.<br />

37. Anderson, N. L. and Anderson, N. G. (2002) The human plasma proteome: history,<br />

character, and diagnostic prospects. Mol. Cell Proteomics 1, 845–867.<br />

38. Hu, S., Loo, J. A., and Wong, D. T. (2006) Human body fluid proteome analysis.<br />

Proteomics 6, 6326–6353.<br />

39. Park, M. R., Wang, E. H., Jin, D. C., Cha, J. H., Lee, K. H., Yang, C. W.,<br />

Kang, C. S., and Choi, Y. J. (2006) Establishment of a 2-D human urinary proteomic<br />

map in IgA nephropathy. Proteomics 6, 1066–1076.<br />

40. Tammen, H., Schutle, I., Hess, R., Menzel, C., Kellmann, M., and Schulz-<br />

Knappe, P. (2005) Prerequisites for peptidomic analysis of blood samples: I.<br />

Evaluation of blood specimen qualities and determination of technical performance<br />

characteristics. Comb. Chem. High Trhoughput Screen 8, 725–733.<br />

41. Rai, A. J., Gelfand, C. A., Haywood, B. C., Warunek, D. J., Yi, J., Schuchard, M. D.,<br />

Mehigh, R. J., Cockrill, S. L., Scott, G. B., Tammen, H., Schulz-Knappe, P.,<br />

Speicher, D. W., Vitzthum, F., Haab, B. B., Siest, G., and Chan, D. W.<br />

(2005) HUPO plasma proteome project specimen collection and handling: towards<br />

the standardization of parameters for plasma proteome samples. Proteomics 5,<br />

3262–3277.<br />

42. Zhou, M., Lucas, D. A., Chan, K. C., Issaq, H. J., Petricoin, E. F. 3rd, Liotta, L. A.,<br />

Veenstra, T. D., and Conrads, T. P. (2004) An investigation into the human serum<br />

“interactome”. Electrophoresis 25, 1289–1298.<br />

43. Findeisen, P., Sismanidis, D., Riedl, M., Costina, V., and Neumaier, M. (2005)<br />

Preanalytical impact of sample handling on proteome profiling experiments with<br />

matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Clin.<br />

Chem. 51, 2409–2411.<br />

44. Park, K. S., Kim, H., Kim, N. G., Cho, S. Y., Choi, K. H., Seong, J. K., and Paik,<br />

Y. K. (2002) Proteomic analysis and molecular characterization of tissue ferritin<br />

light chain in hepatocellular carcinoma. Hepatology 35, 1459–1466.<br />

45. Park, K. S., Cho, S. Y., Kim, H., and Paik, Y. K. (2002) Proteomic alterations of the<br />

variants of human aldehyde dehydrogenase isozymes correlate with hepatocellular<br />

carcinoma. Int. J. Cancer 97, 261–265.<br />

46. Marko-Varga, G., Berglund, M., Malmstrom, J., Lindberg, H., and Fehniger, T. E.<br />

(2003) Targeting hepatocytes from liver tissue by laser capture microdissection<br />

and proteomics expression profiling. Electrophoresis 24, 3800–3805.<br />

47. Paradis, V., Degos, F., Dargere, D., Pham, N., Belghiti, J., Degott, C., Janeau,<br />

J. L., Bezeaud, A., Delforge, D., Cubizolles, M., Laurendeau, I., and Bedossa, P.<br />

(2005) Identification of a new biomarker of hepatocellular carcinoma by serum<br />

protein profiling of patients with chronic liver diseases. Hepatology 41, 40–47.


Overview and Introduction to Clinical Proteomics 27<br />

48. Ru, Q. C., Zhu, L. A., Silberman, J., and Shriver, C. D. (2006) Label-free semiquantitative<br />

peptide feature profiling of human breast cancer and breast disease sera via<br />

two-dimensional liquid chromatography–mass spectrometry. Mol. Cell Proteomics<br />

5, 1095–1104.<br />

49. Azad, N. S., Rasool, N., Annuziata, C. M., Minasian, L., Whiteley, G., and<br />

Kohn, E. C. (2006) Proteomics in clinical trials and practice: present uses and<br />

future promise. Mol. Cell Proteomics 5, 1819–1829.<br />

50. Gunter, E. W. (1997) Biological and environmental specimen banking at the<br />

Centers for Disease Control and Prevention. Chemosphere 34, 1945–1953.<br />

51. Strauss, G. H. and Kelly, S. J. (1990) The development of the U.S. EPA health<br />

effects research laboratory frozen blood cell repository program. Mutat. Res. 234,<br />

349–354.<br />

52. Romeo, M. J., Espina, V., Lowenthal, M., Espina, B. H., Petricoin, E. F. 3rd, and<br />

Liotta, L. A. (2005) CSF proteome: a protein repository for potential biomarker<br />

identification. Expert Rev. Proteomics 2, 57–70.<br />

53. Conrads, T. P., Hood, B. L., Petricoin, E. F. 3rd, Liotta, L. A., and Veenstra, T. D.<br />

(2005) Cancer proteomics: many technologies, one goal. Expert Rev. Proteomics<br />

2, 693–703.<br />

54. Schrader, M. and Selle, H. (2006) The process chain for peptidomic biomarker<br />

discovery. Dis. Markers 22, 27–37.<br />

55. Danna, E. A. and Nolan, G. P. (2006) Transcending the biomarker mindset:<br />

deciphering disease mechanisms at the single cell level. Curr. Opin. Chem. Biol.<br />

10, 20–27.<br />

56. De Masi, S., Tosti, M. E., and Mele, A. (2005) Screening for hepatocellular<br />

carcinoma. Dig. Liver Dis. 37, 260–268.<br />

57. Yamaguchi, K., Nagano, M., Torada, N. Hamasaki, N., Kawakita, M., and<br />

Tanaka, M. (2004) Urine diacetylspermine as a novel tumor marker for pancreatobiliary<br />

carcinomas. Rinsho. Byori. 52, 336–339<br />

58. Dabrowska, M., Grubek-Jaworska, H., Domagala-Kulawik, J., Bartoszewicz, Z.,<br />

Kondracka, A., Krenke, R., Nejman, P., and Chazan, R. (2004) Diagnostic usefulness<br />

of selected tumor markers (CA125, CEA, CYFRA 21–1) in bronchoalveolar lavage<br />

fluid in patients with non-small cell lung cancer. Pol. Arch. Med. Wewn 111, 659–665.<br />

59. Gann, P. H., Hennekens, C. H., and Stampfer, M. J. (1995) A prospective evaluation<br />

of plasma prostate-specific antigen for detection of prostatic cancer. JAMA 273,<br />

289–294<br />

60. Ciambellotti, E., Coda, C., and Lanza, E. (1993) Determination of CA 15–3 in the<br />

control of primary and metastatic breast carcinoma. Minerva Med. 84, 107–112.<br />

61. Linkov, F., Lisovich, A., Yurkovetsky, Z., Marrangoni, A., Velikokhatnaya, L.,<br />

Nolen, B., Winans, M., Bigbee, W., Siegfried, J., Lokshin, A., and Ferris, R. L.<br />

(2007) Early detection of head and neck cancer: development of a novel screening<br />

tool using multiplexed immunobead-based biomarker profiling. Cancer Epidemiol.<br />

Biomarkers Prev. 16, 102–107.<br />

62. Casiano, C. A., Mediavilla-Varela, M., and Tan, E. M. (2006) Tumor-associated<br />

antigen arrays for the serological diagnosis of cancer. Mol. Cell Proteomics 5,<br />

1745–1759.


28 Paik et al.<br />

63. Nissom, P. M., Lo, S. L., Lo, J. C., Ong, P. F., Lim, J. W., Ou, K., Liang, R. C.,<br />

Seow, T. K., and Chung, M. C. (2006) Hcc-2, a novel mammalian ER thioredoxin<br />

that is differentially expressed in hepatocellular carcinoma. FEBS Lett. 580, 2216–<br />

2226.<br />

64. Feng, J. T., Liu, Y. K., Song, H. Y., Dai, Z., Qin, L. X., Almofti, M. R., Fang, C. Y.,<br />

Lu, H. J., Yang, P. Y., and Tang, Z. Y. (2005) Heat-shock protein 27: a potential<br />

biomarker for hepatocellular carcinoma identified by serum proteome analysis.<br />

Proteomics 5, 4581–1588.<br />

65. Li, D. Q., Wang, L., Fei, F., Hou, Y. F., Luo, J. M., Wei-Chen, Zeng, R.,<br />

Wu, J., Lu, J. S., Di, G. H., Ou, Z. L., Xia, Q. C., Shen, Z. Z., and<br />

Shao, Z. M. (2006) Identification of breast cancer metastasis-associated proteins<br />

in an isogenic tumor metastasis model using two-dimensional gel electrophoresis<br />

and liquid chromatography-ion trap-mass spectrometry. Proteomics 6,<br />

3352–3368.<br />

66. Lee, I. N., Chen, C. H., Sheu, J. C., Lee, H. S., Huang, G. T., Yu, C. Y.,<br />

Lu, F. J., and Chow, L. P. (2005) Identification of human hepatocellular carcinomarelated<br />

biomarkers by two-dimensional difference gel electrophoresis and mass<br />

spectrometry. J. Proteome Res. 4, 2062–2069.<br />

67. Righetti, P. G., Castagna, A., Antonucci, F., Piubelli, C., Cecconi, D.,<br />

Campostrini, N., Rustichelli, C., Antonioli, P., Zanusso, G., Monaco, S., Lomas, L.,<br />

and Boschetti, E. (2005) Proteome analysis in the clinical chemistry laboratory:<br />

myth or reality Clin. Chim. Acta 357, 123–139.<br />

68. Jang, J. S., Cho, H. Y., Lee, Y. J., Ha, W. S., and Kim, H. W. (2004) The<br />

differential proteome profile of stomach cancer: identification of the biomarker<br />

candidates. Oncol. Res. 14, 491–499.<br />

69. Steel, L. F., Shumpert, D., Trotter, M., Seeholzer, S. H., Evans, A. A., London,<br />

W. T., Dwek, R., and Block, T. M. (2003) A strategy for the comparative analysis<br />

of serum proteomes for the discovery of biomarkers for hepatocellular carcinoma.<br />

Proteomics 3, 601–609.<br />

70. Yip, T. T., Chan, J. W., Cho, W. C., Yip, T. T., Wang, Z., Kwan, T. L., Law, S. C.,<br />

Tsang, D. N., Chan, J. K., Lee, K. C., Cheng, W. W., Ma, V. W., Yip, C.,<br />

Lim, C. K., Ngan, R. K., Au, J. S., Chan, A., Lim, W. W., and Ciphergen SARS<br />

Proteomics Study Group (2005) Protein chip array profiling analysis in patients<br />

with severe acute respiratory syndrome identified serum amyloid a protein as a<br />

biomarker potentially useful in monitoring the extent of pneumonia. Clin. Chem. 51,<br />

47–55.<br />

71. Anderson, L. and Hunter, C. L. (2005) Quantitative mass spectrometric multiple<br />

reaction monitoring assays for major plasma proteins. Mol. Cell Proteomics 5,<br />

573–588.<br />

72. Lee, J. W., Figeys, D., and Vasilescu, J. (2007) Biomarker assay translation from<br />

discovery to clinical studies in cancer drug development: quantification of emerging<br />

protein biomarkers. Adv. Cancer Res. 96, 269–298.<br />

73. Zolg, W. (2006) The proteomic search for diagnostic biomarkers: lost in translation<br />

Mol. Cell Proteomics 5, 1720–1726.


Overview and Introduction to Clinical Proteomics 29<br />

74. Bensmail, H., Golek, J., Moody, M. M., Semmes, J. O., and Haoudi, A. (2005)<br />

A novel approach for clustering proteomics data using Bayesian fast Fourier<br />

transform. Bioinformatics 21, 2210–2224.<br />

75. Ward, D. G., Cheng, Y., N’Kontchou, G., Thar, T. T., Barget, N., Wei, W.,<br />

Billingham, L. J., Martin, A., Beaugrand, M., and Johnson, P. J. (2006) Changes in<br />

the serum proteome associated with the development of hepatocellular carcinoma<br />

in hepatitis C-related cirrhosis. Br. J. Cancer 94, 287–292.<br />

76. Lin, N. and Zhao, H. (2005) Are scale-free networks robust to measurement errors<br />

BMC Bioinformatics 6, 119.<br />

77. Castagna, A., Cecconi, D., Sennels, L., Rappsilber, J., Guerrier, L., Fortis, F.,<br />

Boschetti, E., Lomas, L., and Righetti, P. G. (2005) Exploring the hidden human<br />

urinary proteome via ligand library beads. J. Proteome Res. 4, 1917–1930.<br />

78. Rauch, A., Bellew, M., Eng, J., Fitzgibbon, M., Holzman, T., Hussey, P., Igra, M.,<br />

Maclean, B., Lin, C. W., Detter, A., Fang, R., Faca, V., Gafken, P., Zhang, H.,<br />

Whiteaker, J., States, D., Hanash, S., Paulovich, A., and McIntosh, M. W. (2006)<br />

Computational proteomics analysis system (CPAS): an extensible open source<br />

analytic system for evaluating and publishing proteomic data and high throughput<br />

biological experiments. J. Proteome Res. 5, 112–121.<br />

79. Lilley, K. S. and Friedman, D. B. (2004) All about DIGE: quantification technology<br />

for differential-display 2D-gel proteomics. Expert Rev. Proteomics 1, 401–409.<br />

80. Qian, W. J., Jacobs, J. M., Liu, T., Camp, D. G. 2nd, and Smith, R. D.<br />

(2006) Advances and challenges in liquid chromatography-mass spectrometrybased<br />

proteomics profiling for clinical applications. Mol. Cell Proteomics 5,<br />

1727–1744.<br />

81. Powell, D. W., Merchant, M. L., and Link, A. J. (2006) Discovery of regulatory<br />

molecular events and biomarkers using 2D capillary chromatography and mass<br />

spectrometry. Expert Rev. Proteomics 3, 63–74.<br />

82. Andre, M., Le Caer, J. P., Greco, C., Planchon, S., El Nemer, W., Boucheix, C.,<br />

Rubinstein, E., Chamot-Rooke, J., and Le Naour, F. (2006) Proteomic analysis of<br />

the tetraspanin web using LC-ESI-MS/MS and MALDI-FTICR-MS. Proteomics<br />

6, 1437–1449.<br />

83. Greengauz-Roberts, O., Stoppler, H., Nomura, S., Yamaguchi, H.,<br />

Goldenring, J. R., Podolsky, R. H., Lee, J. R., and Dynan, W. S. (2005) Saturation<br />

labeling with cysteine-reactive cyanine fluorescent dyes provides increased sensitivity<br />

for protein expression profiling of laser-microdissected clinical specimens.<br />

Proteomics 5, 1746–1757.<br />

84. Heck, A. J. and Krijgsveld, J. (2004) Mass spectrometry-based quantitative<br />

proteomics. Expert Rev. Proteomics 1, 317–326.<br />

85. Schneider, L. V. and Hall, M. P. (2005) Stable isotope methods for high-precision<br />

proteomics. Drug Discov. Today 10, 353–363.<br />

86. Zhang, J., Goodlett, D. R., Peskind, E. R., Quinn, J. F., Zhou, Y., Wang, Q.,<br />

Pan, C., Yi, E., Eng, J., Aebersold, R. H., and Montine, T. J. (2005) Quantitative<br />

proteomic analysis of age-related changes in human cerebrospinal fluid. Neurobiol<br />

Aging 26, 207–227.


30 Paik et al.<br />

87. Liu, T., Qian, W. J., Strittmatter, E. F., Camp, D. G. 2nd, Anderson, G. A.,<br />

Thrall. B. D., and Smith, R. D. (2004) High-throughput comparative proteome<br />

analysis using a quantitative cysteinyl-peptide enrichment technology. Anal. Chem.<br />

76, 5345–5353.<br />

88. Li, C., Hong, Y., Tan, Y. X., Zhou, H., Ai, J. H., Li, S. J., Zhang, L., Xia, Q. C.,<br />

Wu, J. R., Wang, H. Y., and Zeng, R. (2004) Accurate qualitative and quantitative<br />

proteomic analysis of clinical hepatocellular carcinoma using laser capture<br />

microdissection coupled with isotope-coded affinity tag and two-dimensional liquid<br />

chromatography mass spectrometry. Mol. Cell Proteomics 3, 399–409.<br />

89. Sheehan, K. M., Calvert, V. S., Kay, E. W., Lu, Y., Fishman, D., Espina, V.,<br />

Aquino. J., Speer, R., Araujo, R., Mills, G. B., Liotta, L. A., Petricoin, E. F.<br />

3rd, and Wulfkuhle, J. D. (2005) Use of reverse phase protein microarrays and<br />

reference standard development for molecular network analysis of metastatic<br />

ovarian carcinoma. Mol. Cell Proteomics 4, 346–355.<br />

90. Knezevic, V., Leethanakul, C., Bichsel, V. E., Worth, J. M., Prabhu, V. V., Gutkind,<br />

J. S., Liotta, L. A., Munson, P. J., Petricoin, E. F. 3rd, and Krizman, D. B. (2001)<br />

Proteomic profiling of the cancer microenvironment by antibody arrays. Proteomics<br />

1, 1271–1278.<br />

91. Sharma-Oates, A., Quirke, P., Westhead, D. R. (2005) TmaDB: a repository for<br />

tissue microarray data. BMC Bioinformatics 6, 218.<br />

92. Rai, A. J., Stemmer, P. M., Zhang, Z., Adam, B. L., Morgan, W. T., Caffrey,<br />

R. E., Podust, V. N., Patel, M., Lim, L. Y., Shipulina, N. V., Chan, D. W.,<br />

Semmes, O. J., and Leung, H. C. (2005) Analysis of human proteome organization<br />

plasma proteome project (HUPO PPP) reference specimens using surface enhanced<br />

laser desorption/ionization-time of flight (SELDI-TOF) mass spectrometry: multiinstitution<br />

correlation of spectra and identification of biomarkers. Proteomics 5,<br />

3467–3474.<br />

93. Engwegen, J. Y., Gast, M. C., Schellens, J. H., and Beijnen, J. H. (2006)<br />

Clinical proteomics: searching for better tumour markers with SELDI-TOF mass<br />

spectrometry. Trends Pharmacol. Sci. 27, 251–259.<br />

94. Domon, B. and Aebersold, R. (2006) Mass spectrometry and protein analysis.<br />

Science 312, 212–217.<br />

95. Domon, B. and Aebersold, R. (2006) Challenges and opportunities in proteomics<br />

data analysis. Mol. Cell Proteomics 5, 1921–1926.<br />

96. Uhlen, M. and Ponten, F. (2005) Antibody-based proteomics for human tissue<br />

profiling. Mol. Cell Proteomics 4, 384–393.<br />

97. Taussig, M. J., Stoevesandt, O., Borrebaeck, C. A., Bradbury, A. R., Cahill, D.,<br />

Cambillau, C., de Daruvar, A., Dubel, S., Eichler, J., Frank, R., Gibson, T. J.,<br />

Gloriam, D., Gold, L., Herberg, F. W., Hermjakob, H., Hoheisel, J. D., Joos, T. O.,<br />

Kallioniemi, O., Koegll, M., Konthur, Z., Korn, B., Kremmer, E., Krobitsch, S.,<br />

Landegren, U., van der Maarel, S., McCafferty, J., Muyldermans, S., Nygren, P. A.,<br />

Palcy, S., Pluckthun, A., Polic, B., Przybylski, M., Saviranta, P., Sawyer, A.,<br />

Sherman, D. J., Skerra, A., Templin, M., Ueffing, M., and Uhlen, M. (2007)


Overview and Introduction to Clinical Proteomics 31<br />

ProteomeBinders: planning a European resource of affinity reagents for analysis<br />

of the human proteome. Nat. Methods 4, 13–17.<br />

98. Ilyin, S. E., Belkowski, S. M., and Plata-Salaman, C. R. (2004) Biomarker<br />

discovery and validation: technologies and integrative approaches. Trends<br />

Biotechnol. 22, 411–416.


I<br />

Specimen Collection for Clinical<br />

Proteomics


2<br />

Specimen Collection and Handling<br />

Standardization of Blood Sample Collection<br />

Harald Tammen<br />

Summary<br />

Preanalytical variables can alter the analysis of blood-derived samples. Prior to the<br />

analysis of a blood sample, multiple steps are necessary to generate the desired specimen.<br />

The choice of blood specimens, its collection, handling, processing, and storage are<br />

important aspects since these characteristics can have a tremendous impact on the results<br />

of the analysis.<br />

The awareness of clinical practices in medical laboratories and the current knowledge<br />

allow for identification of specific variables that affect the results of a proteomic study.<br />

The knowledge of preanalytical variables is a prerequisite to understand and control their<br />

impact.<br />

Key Words: blood; plasma; serum; proteomics; specimen; preanalytical variables.<br />

1. Introduction<br />

Proteomic analysis of blood specimens by semi-quantitative multiplex<br />

techniques offers a valuable approach for discovery of disease or therapyrelated<br />

biomarkers (1,2). Based on reproducible separation of proteins by their<br />

physical–chemical properties in combination with semi-quantitative detection<br />

methods and bioinformatic data analysis, proteomics allows for sensitive<br />

measurement of proteins in blood specimens (3). Blood can be regarded as<br />

a complex liquid tissue that comprises cells and extracellular fluid (4). The<br />

choice of a suitable specimen-collection protocol is crucial to minimize artificial<br />

processes (e.g., cell lysis, proteolysis) occurring during specimen collection and<br />

preparation (5). Preanalytic procedures can alter the analysis of blood-derived<br />

From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />

Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />

35


36 Tammen<br />

samples. These procedures comprise the processes prior to actual analysis of<br />

the sample and include steps needed to obtain the primary sample (e.g., blood)<br />

and the analytical specimen (e.g., plasma, serum, cells). Legal or ethical issues<br />

(e.g., importance of informed consents) or potential risks of phlebotomy (e.g.,<br />

bleeding) are not covered in this article.<br />

1.1. Collection of Blood Samples<br />

It has been reported that the most frequent faults in the preanalytical phase<br />

are the result of erroneous procedures of sample collection (e.g., drawing blood<br />

from an infusive line resulting in sample dilution) (6). The design of blood<br />

collection devices may aid in correct sampling: evacuated containers sustain<br />

the draw of accurate quantity of blood to ensure the correct concentration of<br />

additives or the correct dilution of the blood, such as in the case of citrated<br />

plasma. The speed of blood draw is also controlled and restricts the mechanical<br />

stress. The favored site of collection is the median cubital vein, which is<br />

generally easily found and accessed. As such, it will be most comfortable to<br />

the patient, and should not evoke additional stress. Preparation of the collection<br />

site includes proper cleaning of the skin with alcohol (2-propanol). The alcohol<br />

must be allowed to evaporate, since commingling of the remaining alcohol<br />

with blood sample may result in hemolysis, raise the levels of distinct analytes,<br />

and cause interferences. The position of the patient (standing, lying, sitting)<br />

can affect the hematocrit (7), and hence may change the concentration of the<br />

analytes. Tourniquet should be applied 3–4 inches above the site of venipuncture<br />

and should be released as soon as blood begins flowing into the collection<br />

device. The duration of venous occlusion (>1 min) can affect the sample<br />

composition. Prolonged occlusion may result in hemoconcentration and subsequently<br />

increase the miscellaneous analytes, e.g., total protein levels. Blood<br />

should be collected from fasting patients in the morning between 7 and 9 a.m.,<br />

because ingestion or circadian rhythms can alter the concentration of analytes<br />

considerably (e.g., total protein, hemoglobin, myoglobin).<br />

1.2. Characteristics of Serum and Plasma Specimens<br />

Serum is one of the most frequently analyzed blood specimens. The<br />

generation of serum is time consuming and associated with the activation of<br />

coagulation cascade and complement system. These processes influence the<br />

composition of the samples, because they result in cell lysis (e.g., thrombocytes,<br />

erythrocytes). As a consequence, the concentration of components in<br />

the extracellular fluid, such as aspartate-aminotransferase, serotonin, neuronspecific<br />

enolase, and lactate-dehydrogenase, are increased (8). On the other<br />

hand, degradation of the analytes (e.g., hormones) may occur faster (9). Onthe


Specimen Collection and Handling 37<br />

proteomic level, more peptides and less proteins are observed in serum when<br />

compared to plasma (10,11).<br />

Consequently, the activation of clotting cascades necessary to generate serum<br />

can lead to artefacts. A reason to use serum as a specimen is based on<br />

the notion that the proteome or peptidome of serum may reflect biological<br />

events (12). Post-sampling proteolytic cleavage products have been proposed<br />

as biomarkers, and it has been further suggested that serum peptidome is of<br />

particular diagnostic value for the detection of cancer (13). However, it has<br />

been reported that more protein changes occur in serum than in plasma (14).<br />

Thus, it can be expected that the reproducibility of such ex vivo proteolytic<br />

events is comparatively low.<br />

In contrast to serum, citrate and EDTA inhibit coagulation and other<br />

enzymatic processes by chelate formation with ions, thereby inhibiting iondependent<br />

enzymes. This is in contrast to heparin, which acts through the<br />

activation of antithrombin III. The main concern associated with heparinized<br />

plasma for proteomic studies is that it is a poly-disperse charged molecule that<br />

binds many proteins non-specifically (15,16), and may also influence separation<br />

procedures and mass spectrometric detection of peptides and small proteins due<br />

to its similar molecular weight (17).<br />

The sampling of plasma is less time consuming than the acquisition of serum.<br />

Separation of the cells and the liquid phase can be performed subsequently to<br />

sample collection since no clotting time is required (30–60 min). In comparison<br />

to serum, the amount of plasma generated from blood is approximately 10 to<br />

20% higher. Additionally, the protein content of plasma is also higher than in<br />

serum, because of the presence of clotting factors and associated components.<br />

Furthermore, proteins may be bound to the clot, resulting in a decrease of<br />

protein concentration.<br />

1.3. Processing of Blood Samples<br />

A quick separation of cells from the plasma is favorable, since cellular<br />

constituents may liberate substances that alter the composition of the sample.<br />

Generally, it is recommended that plasma and serum be centrifuged with<br />

1300–2000×g for 10 min within 30 min from the collection of the sample. The<br />

temperature should generally be 15–24°C (18), unless recommended differently<br />

for distinct analytes like gastrin or A-type natriuretic peptide. Processing at 4°C<br />

appears to be attractive, because enzymatic degradation processes are reduced<br />

at low temperatures. However, platelets become activated at low temperatures<br />

(19) and release intracellular proteins and enzymes, which affect the sample<br />

composition. Thus, processing at low temperatures is safe only after thrombocytes<br />

have been removed. Since one centrifugation step may be insufficient for


38 Tammen<br />

depletion of platelets below 10 cells/nL, a second centrifugation step (2500×g<br />

for 15 min at room temperature) or filtration step may be required to obtain<br />

platelet-poor plasma. This procedure is applicable only to plasma since the<br />

platelets in serum are already activated.<br />

1.4. Protease Inhibitors<br />

Protease inhibitors would be attractive, but commonly used protease cocktails<br />

may introduce difficulties due to interference with mass spectrometry and<br />

formation of covalent bonds with proteins, which would result in shifting the<br />

isoform pattern (20). Protease inhibitors have been considered and investigated as<br />

additives in proteome research to prevent or slow down proteolytic processes and<br />

thereby provide a means of more sensitive detection of markers in blood (21).<br />

Even though protein integrity has been shown to be maintained by the<br />

addition of 15 commercially available protease inhibitors, the usefulness of<br />

protease inhibitors in overall protein stabilization of blood samples remains to<br />

be investigated in more detail (22). The presence of certain protease inhibitors<br />

in whole blood is toxic to live cells. Stressed, apoptotic, or necrotic cells release<br />

substances, and it may be argued that this affects the composition of serum or<br />

plasma until the cellular and soluble factions of blood are separated. However,<br />

careful selection of an appropriate protease inhibitor may solve this problem.<br />

2. Materials<br />

1. Twenty gauge needles and an appropriate adapter (e.g., Sarstedt, Nümbrecht,<br />

Germany) or a Vacutainer system (BD Bioscience, Franklin Lakes, USA).<br />

2. Alcohol (2-propanol) in spray flask.<br />

3. Swabs.<br />

4. Examination gloves.<br />

5. Tourniquet or sphygmomanometer.<br />

6. Blood collection tubes (e.g., Sarstedt).<br />

7. Centrifuge with a swinging bucket rotor (e.g., Sigma 4K15, Sigma Laborzentrifugen,<br />

Osterode, Harz).<br />

8. A 10-mL syringe equipped with a cellulose acetate filter unit with 0.2 μm pore<br />

size and 5 cm 2 filtration area (e.g., Sartorius Minisart, Sarstedt).<br />

9. 2 mL cryo-vials.<br />

10. Pipette and tips.<br />

3. Methods<br />

1. Venipuncture of a cubital vein is performed using a 20-gauge needle (diameter:<br />

0.9 mm, e.g., butterfly system max. tubing length: 6 cm). If tourniquet is applied,<br />

it should not remain in place for longer than 1 min (risk of falsifying results due to


Specimen Collection and Handling 39<br />

hemoconcentration). As soon as the blood flows into the container, the tourniquet<br />

has to be released at least partially. If more time is required, the tourniquet<br />

has to be released so that circulation resumes and normal skin color returns to<br />

extremity.<br />

• Prior to blood collection for proteomic analysis, blood is aspirated into the<br />

first container (e.g., 2.7 mL S-Monovette, Sarstedt, Nümbrecht, Germany).<br />

This is done to flush the surface and remove initial traces of contact-induced<br />

coagulation. This sample is not useful for analysis.<br />

• Afterward, blood is drawn into a standard EDTA or citrate-containing syringe<br />

(e.g. 9 mL EDTA-Monovette, Sarstedt, Nümbrecht, Germany). Depending on<br />

ease of blood flow, several samples can be collected. Free flow with mild<br />

aspiration should be assured to avoid haemolysis.<br />

2. After venipuncture, plasma is obtained by centrifugation for 10 min at 2000×g at<br />

room temperature. Centrifugation should start within 30 min after blood collection.<br />

The resulting plasma sample may now be separated from red and white blood<br />

cells in an efficient and gentle way. Nevertheless, a significant number of platelets<br />

(∼25%) are still present in the sample. This requires an additional preparation<br />

step.<br />

3. For platelet depletion, one of the following procedures has to be undertaken<br />

directly after step 2:<br />

• Platelet removal by centrifugation: The plasma sample is transferred into a<br />

second vial for another centrifugation for 15 min at 2500×g at room temperature.<br />

After centrifugation, the supernatant is transferred in aliquots of 1.5 mL<br />

into cryo vials.<br />

• Platelet removal by filtration: Plasma aliquots of 1.5 mL resulting from step<br />

2 are transferred into 2-mL cryo vials using a 10-mL syringe equipped with<br />

a cellulose acetate filter unit with 0.2 μm pore size and 5 cm 2 filtration area<br />

(e.g., Sartorius Minisart ® , Sartorius, Göttingen, Germany). Filtration requires<br />

only gentle pressure.<br />

4. Samples are transferred to an –80°C freezer within 30 min. Storage is at –80°C.<br />

Transport of samples is done on dry ice.<br />

4. Notes<br />

4.1. Frequently Made Mistakes<br />

4.1.1. Blood Withdrawal<br />

• The patient was not fasting (i.e., had taken food prior to sampling).<br />

• The blood was drawn from an infusive line.<br />

• The blood was drawn in a wrong position (e.g., supine, upright).<br />

• The consumables used were different than those recommended.


40 Tammen<br />

• The expiry date of consumables was already reached.<br />

• The tubes were not properly filled.<br />

• The tubes were agitated vigorously (instead of gentle shaking to dissolve the anticoagulant).<br />

• The blood sample tubes were not consistently kept at room temperature.<br />

• The sample tubes were put on ice or in a refrigerator.<br />

.<br />

4.1.2. Lab Handling<br />

• Centrifugation was delayed more than 30 min after blood withdrawal.<br />

• A cooling centrifuge was adjusted below room temperature.<br />

• The centrifugation speed was wrong (e.g., rounds per minute were set instead of<br />

g-force).<br />

• The centrifugation time was wrong.<br />

• The removal of blood plasma by pipetting was done without proper caution. Consequently,<br />

the buffy coat or the red blood cells were churned up.<br />

• The second centrifugation of recovered plasma samples was delayed after first<br />

centrifugation.<br />

4.1.3. Storage of Samples<br />

• The storage of samples was delayed.<br />

• The storage temperatures were above –80°C.<br />

• The labeling of sample containers was unreadable or confusable.<br />

• The attachment of labels to the sample containers was not proper during storage or<br />

handling resulted in loss of labels.<br />

4.1.4. General Recommendations<br />

• A proper first centrifugation should produce a visible white blood cell layer (buffy<br />

coat) between red blood cells and plasma. If not, centrifugation speed or time may<br />

be wrong.<br />

• One should discard plasma that is icteric or exhibits signs of haemolysis. One should<br />

check with an expert if this was due to that particular disease.<br />

References<br />

1. Vitzthum F, Behrens F, Anderson NL, Shaw JH. (2005) Proteomics: from basic<br />

research to diagnostic application. A review of requirements and needs. J. Proteome<br />

Res. 4, 1086–97.<br />

2. Lathrop JT, Anderson NL, Anderson NG, Hammond DJ. (2003) Therapeutic<br />

potential of the plasma proteome. Curr. Opin. Mol. Ther. 5, 250–7.


Specimen Collection and Handling 41<br />

3. Wang W, Zhou H, Lin H, Roy S, Shaler TA, Hill LR et al. (2003) Quantification of<br />

proteins and metabolites by mass spectrometry without isotopic labeling or spiked<br />

standards. Anal. Chem. 75, 4818–26.<br />

4. Anderson NL, Anderson NG. (2002) The human plasma proteome: history,<br />

character, and diagnostic prospects. Mol. Cell. Proteomics 1, 845–67.<br />

5. Omenn GS. (2004) The Human Proteome Organization Plasma Proteome<br />

Project pilot phase: reference specimens, technology platform comparisons, and<br />

standardized data submissions and analyses. Proteomics 4, 1235–40.<br />

6. Plebani M, Carraro P. (1997) Mistakes in a stat laboratory: types and frequency.<br />

Clin. Chem. 43, 1348–51.<br />

7. Burtis CA, Ashwood E. (eds) (2001) Fundamentals of Clinical Chemistry.<br />

Saunders, Philadelphia.<br />

8. Guder WG, Narayanan S, Wisser H, Zawata B. (2003) Samples: From the Patient to<br />

the Laboratory. The Impact of Preanalytical Variables on the Quality of Laboratory<br />

Results. GIT Verlag, Darmstadt, Germany.<br />

9. Evans MJ, Livesey JH, Ellis MJ, Yandle TG. (2001) Effect of anticoagulants and<br />

storage temperatures on stability of plasma and serum hormones. Clin. Biochem<br />

34, 107–12.<br />

10. Omenn GS, States DJ, Adamski M, Blackwell TW, Menon R, Hermjakob H et al.<br />

(2005) Overview of the HUPO Plasma Proteome Project: results from the pilot<br />

phase with 35 collaborating laboratories and multiple analytical groups, generating<br />

a core dataset of 3020 proteins and a publicly-available database. Proteomics 5,<br />

3226–45.<br />

11. Rai AJ, Gelfand CA, Haywood BC, Warunek DJ, Yi J, Schuchard MD et al.<br />

(2005) HUPO Plasma Proteome Project specimen collection and handling: towards<br />

the standardization of parameters for plasma proteome samples. Proteomics 5,<br />

3262–77.<br />

12. Villanueva J, Shaffer DR, Philip J, Chaparro CA, Erdjument-Bromage H,<br />

Olshen AB et al. (2006) Differential exoprotease activities confer tumor-specific<br />

serum peptidome patterns. J. Clin. Invest. 116, 271–84.<br />

13. Liotta LA, Petricoin EF. (2006) Serum peptidome for cancer detection: spinning<br />

biologic trash into diagnostic gold. J. Clin. Invest. 116, 26–30.<br />

14. Tammen H, Schulte I, Hess R, Menzel C, Kellmann M, Schulz-Knappe P. (2005)<br />

Prerequisites for peptidomic analysis of blood samples: I. Evaluation of blood<br />

specimen qualities and determination of technical performance characteristics.<br />

Comb. Chem. High Throughput Screen. 8, 725–33.<br />

15. Holland NT, Smith MT, Eskenazi B, Bastaki M. (2003) Biological sample collection<br />

and processing for molecular epidemiological studies. Mutat. Res. 543, 217–34.<br />

16. Landi MT, Caporaso N. (1997) Sample collection, processing and storage. IARC<br />

Sci. Publ. 223–36.<br />

17. Tammen H, Schulte I, Hess R, Menzel C, Kellmann M, Mohring T,<br />

Schulz-Knappe P. (2005) Peptidomic analysis of human blood specimens:<br />

comparison between plasma specimens and serum by differential peptide display.<br />

Proteomics 13, 3414–22.


42 Tammen<br />

18. Favaloro EJ, Soltani S, McDonald J. (2004) Potential laboratory misdiagnosis of<br />

hemophilia and von Willebrand disorder owing to cold activation of blood samples<br />

for testing. Am. J. Clin. Pathol. 122, 686–92.<br />

19. Mustard JF, Kinlough-Rathbone RL, Packham MA. (1989) Isolation of human<br />

platelets from plasma by centrifugation and washing. Methods Enzymol. 169, 3–11.<br />

20. Schuchard MD, Mehigh RJ, Cockrill SL, Lipscomb GT, Stephan JD, Wildsmith J<br />

et al. (2005) Artifactual isoform profile modification following treatment of<br />

human plasma or serum with protease inhibitor, monitored by 2-dimensional<br />

electrophoresis and mass spectrometry. Biotechniques 39, 239–47.<br />

21. Jeffrey DH, Deidra B, Keith H, Shu-Pang H, Deborah LR, Gregory JO, Stanley AH.<br />

(2004) An Investigation of Plasma Collection, Stabilization, and Storage Procedures<br />

for Proteomic Analysis of Clinical Samples. Humana, Totowa, NJ.<br />

22. Rai AJ, Vitzthum F. (2006) Effects of preanalytical variables on peptide and protein<br />

measurements in human serum and plasma: implications for clinical proteomics.<br />

Expert Rev. Proteomics 3, 409–26.


3<br />

Tissue Sample Collection for Proteomics Analysis<br />

Jose I. Diaz, Lisa H. Cazares, and O. John Semmes<br />

Summary<br />

Successful collection of tissue samples for molecular analysis requires critical considerations.<br />

We describe here our procedure for tissue specimen collection for proteomic<br />

purposes with emphasis on the most important steps, including timing issues and the procedures<br />

for immediate freezing, storage, and microdissection of the cells of interest or “tissue<br />

targets” and the lysates for protein isolation for SELDI, MALDI, and 2DGE applications.<br />

The pathologist is at the cornerstone of this process and is an invaluable collaborator.<br />

In most institutions, pathologists are responsible for “tissue custody,” and they closely<br />

supervise the tissue bank. In addition, they are optimally trained in histopathology in<br />

order to they assist investigators to correlate tissue morphology with molecular findings.<br />

In recent years, the advent of the laser capture microscope, a tool ideally designed for<br />

pathologists, has tremendously facilitated the efficiency of collecting tissue targets for<br />

molecular analysis.<br />

Key Words: tissue bank; frozen section; immunofluorescence; laser capture microscope;<br />

proteomics.<br />

1. Introduction<br />

From the completion of surgery and the acquisition of tissue sample to<br />

protein isolation and performing the various proteomic techniques, a number<br />

of challenges must be overcome. The first challenge is time. Surgery is<br />

associated with loss of vascular supply, resulting in progressive increase of<br />

endogenous protease activity, protein degradation, and tissue autolysis. For<br />

this reason, specimens submitted for tissue procurement must be processed<br />

without delay. Formalin fixation, a standard processing procedure in pathology,<br />

From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />

Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />

43


44 Diaz et al.<br />

stops protease activity. However, formalin is a cross-linking fixative that<br />

irreversibly alters protein, thus compromising the quality of the extracts for<br />

most proteomic techniques. Recent technical developments appear promising<br />

and may ultimately enable peptide analysis and protein identification (bottom<br />

up proteomics) in formalin-fixed paraffin embedded tissue (1). At present,<br />

however, it is imperative to take a representative “fresh” tissue sample immediately<br />

after surgery when collecting tissue for proteomic studies, including<br />

MALDI TOF MS and 2DGE. The surgical specimen should be transported<br />

quickly to pathology, and a representative tissue sample should be obtained<br />

under the supervision of a pathologist. The sample should be embedded in OCT<br />

and frozen without delay. Ideally, a frozen section should be performed for<br />

quality assurance before archiving the sample. Once the pathologist confirms<br />

that the expected targets are present in the collected tissue (for instance, tumor<br />

and non-tumor tissue), the frozen specimen can be stored in a –80°C freezer for<br />

subsequent use. Overcoming time constraints requires appropriate institutional<br />

policies and dedicated personnel. From our experience, it is better to delegate<br />

the responsibility of transporting the surgical specimen from the operating room<br />

to pathology to dedicated tissue procurement personnel, instead of expecting<br />

the surgical team to deliver the specimens. When collecting and archiving tissue<br />

samples, our policy is to bisect the sample into two halves, one embedded<br />

in OCT and stored permanently at –80°C for future molecular studies, and<br />

one submitted as a “mirror image” processed in formalin after performing a<br />

frozen section for morphologic comparison and cell type mapping after basic<br />

hematoxylin and eosin (H&E) staining. This formalin-processed mirror image<br />

tissue provides optimal morphological detail, which might be necessary in<br />

the future. For instance, it is very difficult to identify prostatic intraepithelial<br />

neoplasia (PIN) on frozen section slides; however, the formalin fixed section,<br />

which closely mimics the frozen section, can be used for guidance.<br />

After archiving the tissue sample, the next challenge is to ensure that the<br />

proteomic findings are representative of the tissue targets under investigation,<br />

given the cellular heterogeneity present in most tissues. For instance, if one<br />

would like to determine the differential protein expression in tumor versus<br />

non-tumor, one must ensure that proteins are separately and reliably extracted<br />

from normal and tumor cells. Certainly, many solid tumors are visible to the<br />

naked eye, and both tumor and non-tumor tissues can be collected by gross<br />

inspection. However, under a microscope, the tumor bed contains not only<br />

tumor cells but many other tumor–associated, non-tumoral elements, such as<br />

supporting stromal cells, blood vessels, infiltrating lymphocytes, etc. Moreover,<br />

microscopic foci of tumor may infiltrate grossly normal tissue. In the past,<br />

various approaches were followed to collect cells from tissue sections, including<br />

manual microdissection with a syringe. In the recent years, the procedure


Tissue Sample Collection for Proteomics Analysis 45<br />

of laser-capture microdissection (2) has tremendously increased the quality,<br />

specificity, and speed of the process, allowing selective capture of cells and<br />

various tissue elements while preserving the molecular integrity (3,4,5).<br />

The LCM is a special microscope that isolates cells from frozen or formalinfixed<br />

tissues and cytological preparations. Microdissection of single cells or<br />

multicellular structures is accomplished by placing a plastic polymer (cap) over<br />

the tissue while pulsing an infrared laser for the polymer to melt and adhere<br />

to the target cells under the laser ring. When the cap is removed, the cells<br />

that adhered to the polymer detach from the surrounding tissue without any<br />

molecular damage, becoming suitable for the extraction of high-quality nucleic<br />

acids and proteins, and for a wide range of downstream molecular analyses,<br />

A<br />

B<br />

C<br />

D<br />

Fig. 1. Selective immunofluorescent LCM of prostate gland’s basal cells by immunocapture:<br />

(A) immunofluorescent staining of basal cells with a mAb against highmolecular-weight<br />

keratins, which are highly expressed on basal cells, (B) selection<br />

of immunofluorescent-positive basal cells for subsequent LCM, (C) captured<br />

immunofluorescent-positive cells after LCM photographed from the plastic cap,<br />

(D) remaining of the gland after removing the basal cell layer by LCM.


46 Diaz et al.<br />

such as gene expression microarrays, or proteomics. The use of a microscope<br />

can be coupled with special immunostaining procedures if one wishes to capture<br />

specific cell types not easily identified by morphology alone, which is the<br />

“so called” immunocapture procedure (6,7), which further enhances the specificity<br />

of tissue procurement for molecular analysis. For example, in a former<br />

study (8), we were able to selectively capture basal cells from benign prostate<br />

glands, which are extremely difficult to recognize morphologically but easily<br />

identifiable after immunostaining for high-molecular-weight cytokeratin (Fig. 1).<br />

We obtained excellent protein quality results and were able to identify several<br />

protein peaks preferentially expressed in these cells using SELDI-TOF-MS.<br />

When we compared the protein spectra from the same tissue sample sections<br />

routinely stained with hematoxilin with those immunostained for high-molecularweight<br />

cytokeratins, there was no difference in the spectra, militating against<br />

any significant protein deterioration due to the immunostaining procedure.<br />

2. Materials<br />

2.1. Tissue Collection and Storage<br />

1. Tissue-Tek Cryomold-standard (Sakura, Torrance, CA)<br />

2. Tissue-Tek OCT (Sakura)<br />

3. 2 ′ methylbutane (Mallinckrodt, St. Louis, MO)<br />

4. Shandon Histobath II (Thermo Electron Corp., Waltham, MA)<br />

5. –80°C freezer<br />

2.2. Frozen Tissue Sectioning and Staining<br />

1. Cryostat<br />

2. HistoGene TM LCM Frozen Section Staining Kit (Arcturus Biosciences Inc,<br />

Mountain <strong>View</strong>, CA). The kit contains histogene staining solution, ethanol (75,<br />

95, 100%), xylene, distilled water nuclease free, histogene LCM slides, and<br />

disposable slide staining jars.<br />

3. 1× PBS made from 10× stock (Fisher Scientific)<br />

4. Acetone (high purity grade)<br />

5. Cy3-Strepavidin (Invitrogen, Carlsbad, CA)<br />

6. Biotinylated mAbs: Any antibody can be biotinylated. We routinely have 1.5 mg of<br />

antibody labeled with 0.2 mg biotin (Alpha Diagnostic Intl. Inc. San Antonio, TX).<br />

2.3. LCM<br />

1. PixCell II LCM System (Arcturus Biosciences Inc)<br />

2. AutoPix TM Automated LCM System (Arcturus Biosciences Inc)<br />

3. CapSure ® LCM caps (Arcturus Biosciences Inc)<br />

4. Prep Strip (Arcturus Biosciences Inc)<br />

5. Microcentrifuge tubes (0.5 ml) (Eppendorf North America)


Tissue Sample Collection for Proteomics Analysis 47<br />

2.4. LCM Lysate<br />

1. Micropipet capable of delivering 1 μl accurately<br />

2. 20 mM HEPES (pH to 8.0 with NaOH) with 1% Triton X-100<br />

3. Sonicator (optional)<br />

4. 1× PBS<br />

2.5. SELDI Analysis<br />

1. IMAC3 or WCX2 Protein Array Chips (Ciphergen Biosystems Palo Alto, CA)<br />

2. HPLC grade water (Fisher Scientific)<br />

3. 100 mM sodium acetate pH 4.0<br />

4. 100 mM ammonium acetate pH 4.0<br />

5. Sinapinic acid (SPA) (Ciphergen Biosystems, Palo Alto, CA)<br />

6. Optima grade Acetonitile (Fisher Scientific)<br />

7. Trifluoroacetic acid, packaged in 1 ml ampules (Pierce Chemical Company,<br />

Rockford, IL)<br />

2.6. MALDI Analysis<br />

1. Target plate<br />

2. Cinaminic acid (CHCA) (Bruker Daltonics, Palo Alto, CA)<br />

3. SPA (Fluka)<br />

4. Optima grade Acetonitile (Fisher Scientific)<br />

5. Trifluoroacetic acid, packaged in 1 ml ampules (Pierce Chemical Company)<br />

3. Method<br />

3.1. Tissue Collection and Storage<br />

1. The tissue sample is embedded in OCT using a cryomold and is frozen in the<br />

Shandon Histobath, which contains 2 ′ methylbutane (see Note 1).<br />

2. Hold the cryomold against the 2 ′ methylbutane liquid interface and allow the<br />

tissue to freeze slowly (3–5 min) (see Note 2).<br />

3. After achieving complete freezing, place the frozen cryomold containing the<br />

sample in a plastic bag and transport the sample within a liquid nitrogen container.<br />

Store the sample in a –80°C freezer.<br />

3.2. Frozen Tissue Sectioning and Staining<br />

3.2.1. Regular Hematoxylin Staining<br />

Prior to LCM, cut 8-μm-thick frozen tissue sections from the cryostat (discard<br />

folded or wrinkled sections). Keep slides with sections in cryostat after cutting<br />

and stain as follows (see Notes 3 and 9; slides may also be frozen at –80°C<br />

until stained.):


48 Diaz et al.<br />

1. Remove the slides from the freezer or cryostat and place in 70% ethanol (30 s).<br />

2. Place in purified water (5 s).<br />

3. Add the Histogene staining solution (30 s) (see Note 4).<br />

4. Rinse the slides with purified water.<br />

5. Wash with 70% ethanol (60 s).<br />

6. Wash with 95% ethanol twice (60 s each).<br />

7. Wash with 100% ethanol (60 s).<br />

8. Place the slides in xylene to ensure complete dehydration (10 min) (see Note 5).<br />

9. Shake off and drain carefully by touching the corner with a particle-free tissue<br />

paper.<br />

10. Air dry the slides to allow xylene to evaporate completely (at least 2 min).<br />

11. The slides are now ready for LCM (they should not be coverslipped) (see<br />

Note 12)<br />

3.2.2. Immunofluorescence Staining (see Note 7)<br />

1. Thaw slides (1 min).<br />

2. Place in cold acetone at 4°C (2 min).<br />

3. Air dry (30 s).<br />

4. Wash in filtered pH 7.4 1× PBS.<br />

5. Drain off slides.<br />

6. Add 100 μl of first biotinylated Ab at optimal dilution: recommended concentration<br />

30–100 μg/ml, optimize for best results (3 min).<br />

7. Rinse in PBS.<br />

8. Add 100 μl of Cy3 at dilution 1:100 (user may decide the optimal staining<br />

concentration of the Cy3 Streptavidin conjugate by performing a serial dilution<br />

staining experiment) (1 min).<br />

9. Rinse in PBS.<br />

10. Place slides in 75% ethanol (30 s).<br />

11. Place slides in 95% ethanol (30 s).<br />

12. Place slides in 100% ethanol (30 s).<br />

13. Place slides in xylene (5 min) (see Note 6).<br />

14. Air dry (5 min).<br />

3.3. LCM<br />

The new instruments developed by Arcturus, such as the AutoPix TM and the<br />

Veritas TM are enclosed in automated systems entirely operated by a computer.<br />

We describe here the LCM procedure using the PixCell II instrument, which<br />

is manually operated and the least expensive LCM instrument today and,<br />

therefore, more widely used (see Note 8).<br />

1. Turn on the instrument and enter pertinent data such as slide #, case #, cap lot #,<br />

thickness (always 8 μm), and place the stained slide on the mechanical stage (see<br />

Note 10).


Tissue Sample Collection for Proteomics Analysis 49<br />

2. Turn on the vacuum pump to immobilize the slide (small aperture on the left side<br />

of the stage) and push in the filter bottom for optimal image quality.<br />

3. Place the caps in the rail on the right side of the stage. Unlock the mechanical arm,<br />

move it toward the tissue, and drop it at the top of the tissue. Align the joystick<br />

to move the stage to a centered and perpendicular position before beginning the<br />

microdissection process.<br />

4. Turn on the key on the right side of the power supply to enable the infrared laser.<br />

Focus the laser before beginning microdissection using the smallest ring diameter<br />

and adjust to the desired diameter.<br />

5. Select the appropriate energy (mW) and time of exposure (ms) for the desired<br />

laser ring diameter and ensure its effectiveness in an area of the tissue that lacks<br />

any interest using a cap to be discarded (see Note 11).<br />

6. Fire the laser each time the ring is over the desired tissue target. Move the stage<br />

supporting the glass slide with the aid of the joystick, which allows fine and<br />

precise motion. Check if the tissue is appropriately microdissected and capture<br />

the tissue images before and after LCM as well as the image of the target tissue<br />

that was captured in the cap (see Note 13).<br />

7. When the cap is filled with the desired amount of tissue, remove the cap and use a<br />

0.5-ml microcentrifuge tube to collect the tissue (the cap is designed to perfectly<br />

fit to close the tube) (see Note 14).<br />

8. The microcentrifuge tube can be safely stored in a –80°C freezer without adding any<br />

buffer and without lysing the cells, which may be done at a convenient time later.<br />

3.4. LCM Lysate<br />

1. Lyse a total of 1500–2000 laser shots (about 3000 to 6000 microdissected cells)<br />

in 4 μl of 20 mM Hepes pH 8.0 with 1% Triton X-100. This is sufficient for<br />

one SELDI protein array or one MALDI run. For 2D analysis, a minimum of<br />

approximately 25,000 cells are necessary.<br />

2. Add the above lysing buffer on the cap and place in the microfuge tube holding<br />

the cap. This is usually done with two additions of 2 μl to the LCM cap. Pipet<br />

up and down and scrape the surface of the LCM cap to remove all the cells. A<br />

gentle scraping motion with the pipet tip may be necessary to remove the cells,<br />

but be careful not to rip the polymer film (see Note 15). Transfer the lysate<br />

from the surface of the cap to the microfuge tube. Cells from multiple caps may<br />

be combined by subsequently using 4 μl of LCM lysate to lyse cells on another<br />

cap. In this way the volume will remain small. If 2DGE may be performed,<br />

the lysis procedure is different (see below). Make a 1:10 dilution of each lysate<br />

in PBS (for IMAC3 SELDI chips) or 100 mM ammonium acetate pH 4.0 (for<br />

WCX2 chips) (i.e., 36 μl added to the 4 μl lysate) vortex for at least 1 min (see<br />

Note 16). Spin down briefly.<br />

3. Prepare the arrays of the IMAC chip with CuSO 4 according to the manufacturer’s<br />

specifications: 20 μl, 100 mM CuSO 4 for 10 min, wash with HPLC water; 20 μl,<br />

100 mM Na acetate pH 4.0 for 5 min, wash with water. Use the Micromix<br />

shaker for all incubations with the following settings: Form-20, Amplitude-5.


50 Diaz et al.<br />

4. Assemble the bioprocessor with the desired number of chips and add 2× 200 μl<br />

PBS to each well, incubate on the shaker for 5 min each time. Pretreat the<br />

WCX2 chip with 100 mM ammonium acetate pH 4.0. This can be done on the<br />

BioMek robot.<br />

5. Add the diluted lysate to the spot on the chip(s) in the bioprocessor.<br />

6. Cover the bioprocessor with a plastic seal and incubate overnight on MicroMix<br />

shaker at room temperature, using the same setting as given above.<br />

7. Remove lysates carefully with a pipet; do not touch the surface of the arrays.<br />

Save if needed for another experiment.<br />

8. Wash the spots in bioprocessor 2× with 200 μl PBS (for IMAC) or 100 mM<br />

ammonium acetate pH 4.0 (for WCX) for 5 min on the shaker.<br />

9. Wash the arrays with HPLC water 2× for 5 min (on shaker).<br />

10. Remove the chip(s) from bioprocessor and give them a final rinse with HPLC<br />

water.<br />

11. Let the chip dry completely, usually overnight.<br />

12. Add 2× 0.5 μl saturated SPA dissolved in 50% acetonitrile, 0.5% TFA.<br />

13. Read at instrument settings optimized for resolution and intensity for the m/z<br />

range of 1000–20,000. Higher laser energy will be required to see higher<br />

molecular weight peaks.<br />

One method of MALDI sample preparation that reduces the complexity of cell<br />

lysates while remaining robust and easily amenable to automated highthroughput<br />

applications is sample fractionation using magnetic beads<br />

(MB) combined with pre-structured MALDI sample supports (AnchorChip<br />

Technology). Several magnetic bead types with different surface chemistries can<br />

be used to fractionate serum and increase the number of detectable peaks (see<br />

the chapter on serum protein profiling for details). For MALDI analysis, dilute<br />

the lysate 1:10 with CHCA or SPA matrix (5–10 mg/ml in 50% acetonitrile, 0.1%<br />

TFA). Spot on Anchorplate and read in a MALDI instrument. Further dilution<br />

and/or fractionation of the lysate may be necessary to achieve optimal spectra.<br />

If 2DGE analysis will be performed, the cells should be lysed as follows:<br />

Remove the LCM cap from the tube and add a small volume (10 μl) of 1D<br />

focusing rehydration buffer to the tube. The preferred number of laser shots is<br />

approximately 100 K. Replace the cap and invert the tube to allow the buffer<br />

to come in contact with the cells on the cap and lyse them. Incubate 5 min<br />

at room temperature. Sonicate the samples to ensure lysis. Continue with the<br />

basic protocol for 1D IEF and 2D analysis.<br />

4. Notes<br />

1. In our experience, a time window of 30 min between completion of surgery<br />

and tissue freezing yields good protein quality for most proteomic techniques.<br />

However, if one is studying protein phosphorylation, this begins to significantly<br />

decrease 20 min after completion of surgery (10).


Tissue Sample Collection for Proteomics Analysis 51<br />

2. When freezing the tissue sample in the Histobath, avoid immediate and complete<br />

immersion in 2 ′ methylbutane to preserve optimal tissue morphology. Hold the<br />

sample at the liquid interface with minimal immersion and wait until the OCT<br />

and the tissue slowly turn white.<br />

3. Use uncoated glass slides for LCM. Coated or electrically-charged glass slides<br />

will interfere with the detachment process of the plastic polymer and are not<br />

suitable for LCM.<br />

4. Precipitate from Hematoxylin can contaminate the surface of the tissue. Filter<br />

these solutions. Add one tablet of protease inhibitor to each staining bath (we use<br />

Complete, from BMB). Do not add protease inhibitor to alcohol baths. If using<br />

the histogene staining kit (Arcturus) for frozen sections, this is not necessary.<br />

5. Change all the staining and alcohol solutions after staining 20 slides.<br />

6. Poor transfers may result if 100% ethanol has hydrated. Increasing the incubation<br />

time in xylene often improves transfer.<br />

7. When specific cells need to be microdissected and these cannot be identified<br />

morphologically, the cells of interest can be immunostained with specific mAbs<br />

against proteins highly expressed on those cells (immunophenotype). It is critical<br />

to expedite the immunostaining procedure because the shorter the immunostaining<br />

time, the better the protein quality. One must avoid exceeding 30 min for<br />

the total immunostaining and dehydration procedure. In the past, we have used<br />

the immunoperoxidase technique with DAB labeling (6), but it was difficult<br />

to perform quick enough to preserve optimal protein integrity. Also, manual<br />

microdissection of DAB labeled cells with Pixel II is extremely tedious and nonpractical.<br />

The immunofluorescence staining method (7) is faster and easier to<br />

perform. This method coupled with the Autopix microscope, which has dark field<br />

fluorescence and automation capabilities, is the ideal procedure for immunocapture.<br />

Since Cy3-strepavidin binds to the antibody labeled with biotin, there is<br />

no need for a secondary antibody, thereby decreasing the necessary staining time.<br />

It is recommended to run negative control staining; use a biotinylated control<br />

antibody from the same animal species and of the same isotype as your primary<br />

antibody. Dilute to the same working concentration as the primary antibody.<br />

8. Do not forget to wear gloves every time while performing LCM, including when<br />

handling the plastic caps.<br />

9. The thickness of the tissue section is a critical parameter for effective LCM. In<br />

our experience (using the Pixel II and the Autopix instruments by Arcturus),<br />

8 μm is the optimal thickness for LCM.<br />

10. Smooth out the surface of the tissue section with a Prep-strip before placing the<br />

slide on the LCM instrument, which improves the efficiency and uniformity of<br />

the microdissection process.<br />

11. The main factors affecting the efficiency of LCM include the energy, the time<br />

of exposure, and the diameter of the laser beam. Regarding the diameter, when<br />

using Pixel II, the smallest ring is 7 μm, the medium ring is 15 μm, and the widest<br />

ring is 30 μm. Very often, we have used the medium (15 μm, which lifts up<br />

about three cells with each shot). When trying to microdissect single cells with


52 Diaz et al.<br />

Pixel II, one must use the smallest (7 μm) diameter ring, but our experience was<br />

frustrating. With Autopix, we have observed that microdissection of individual<br />

cells is better achieved setting the laser ring at 10 μm diameter, below which it<br />

becomes very difficult to lift up cells efficiently. A 30-μm diameter laser is very<br />

effective for microdissection of whole glands and other large tissue structures.<br />

Regarding the other two parameters, the optimization depends on the tissue<br />

type. For instance, for prostate tissue, an energy of 80 mW with a duration<br />

of 0.5 ms is usually effective for a medium-size ring (15 μm). The tuning of<br />

these parameters is accomplished by a “fail and try” approach, progressively<br />

adjusting the energy and the time of exposure for the desired diameter, which<br />

obviously depends on the desired microdissection task (single cells vs. mediumor<br />

large-size tissue structures).<br />

12. Another factor that affects the effectiveness of LCM is the time the tissue section<br />

has been dry after the staining and dehydration procedure. Ideally, the tissue<br />

should be stained and microdissected within 1hifpossible. One must avoid<br />

having the slide under LCM for more than 4 h. If microdissecting many tissues,<br />

stain only four slides at a time.<br />

13. When capturing images before and after microdissection for documentation<br />

purposes, make sure the image on the monitor is focused because that is the<br />

image that would be captured. Sometimes is focused on the microscope but is<br />

unfocused on the monitor. In a typical experiment, you will capture the image<br />

before and after firing the laser, which provides records of the effectiveness in<br />

removing the cell targets. You can also capture the image of microdissected<br />

cells from the polymer cap.<br />

14. Avoid allowing the LCM caps to become excessively crowded. When using<br />

the 15-μm laser ring, microdissection is about three cells per shot. One should<br />

expect around 3000 cells for each 1000 shots, which is about right per single<br />

cap.<br />

15. LCM caps can be viewed under a dissecting microscope to ensure that all cells<br />

have been removed from the polymer film after the lysing procedure.<br />

16. Depending on the cell type, vigorous vortexing and sonication may be necessary<br />

to completely lyse the cells after they are removed from the cap.<br />

References<br />

1. Prieto, D.A., Hood, B.L., Darfler, M.M., Guiel, T.G., Lucas, D.A., Conrads, T.P.,<br />

Veenstra, D.T., and Krizman, D.B. (2005) Liquid Tissue TM : proteomic profiling of<br />

formalin-fixed tissues. Biotechniques 38: 32–5.<br />

2. Emmert-Buck, M.R., Bonner, R.F., Smith, P.D., Chuaqui, R.F., Zhuang, Z.,<br />

Goldstein, S.R., Weiss, R.A., and Liotta, L.A. (1996) Laser capture microdissection.<br />

Science 274: 998–1001.<br />

3. Espina, V., Milia, J., Wu, G., Cowherd, S., Liotta, L.A. (2006) Laser capture<br />

microdissection. Methods Mol Biol 319: 213–29.


Tissue Sample Collection for Proteomics Analysis 53<br />

4. Best, C.J., and Emmert-Buck, M.R. (2001) Molecular profiling of tissue samples<br />

using laser capture microdissection. Expert Rev Mol Diagn. 1: 53–60.<br />

5. Ornstein, D.K., Gillespie, J.W., Paweletz, C.P., Duray, P.H., Herring, J.,<br />

Vocke, C.D., Topalian, S.L., Bostwick, D.G., Linehan, W.M., Petricoin, E.F., III,<br />

and Emmert-Buck, M.R. (2000) Proteomic analysis of laser capture microdissected<br />

human prostate cancer and in vitro prostate cell lines. Electrophoresis 21:<br />

2235–42.<br />

6. Fend, F., Emmert-Buck, M.R., Chuaqui, R., Cole, K., Lee, J., Liotta, L.A., and<br />

Raffeld, M. (1999) Immuno-LCM: laser capture microdissection of immunostained<br />

frozen sections for mRNA analysis. Am J Pathol 154: 61–6.<br />

7. Murakami, H., Liotta, L., Star, R.A. (2000) IF-LCM: laser capture microdissection<br />

of immunofluorescently defined cells for mRNA analysis rapid communication.<br />

Kidney Int 58(3): 1346–53.<br />

8. Cazares, L.H., Adam, B.L., Ward, M.D., Nasim, S., Schellhammer, P.F.,<br />

Semmes, O.J., and Wright, G.L., Jr (2002) Normal, benign, preneoplastic, and<br />

malignant prostate cells have distinct protein expression profiles resolved by<br />

surface enhanced laser desorption/ionization mass spectrometry. Clin Cancer Res<br />

8: 2541–52.<br />

9. Diaz, J., Cazares, L.H., Corica, A., and Semmes O. (2004) Selective capture<br />

of prostatic basal cells and secretory epithelial cells for proteomic and genomic<br />

analysis. Urol Oncol 22(4): 329–36.<br />

10. Mora, L., Buettner, R., Seigne, J., Diaz, J., Hamad, N., Garcia, R., Bowman, T.,<br />

Falcone, R., Faigurth, R., Cantor, A., Muro-Cacho, C., Livistong, S., Levitzki, A.,<br />

Kraker, A., Karras, J., Pow-Sang, J., and Jove, R. (2002) Constitutive activation of<br />

Stat3 in human prostate tumors and cell lines: direct inhibition of stat3 signaling<br />

induces apoptosis of prostate cancer cells. Cancer Research 62: 6659–66.


4<br />

Protein Profiling of Human Plasma Samples<br />

by Two-Dimensional Electrophoresis<br />

Sang Yun Cho, Eun-Young Lee, Hye-Young Kim, Min-Jung Kang,<br />

Hyoung-Joo Lee, Hoguen Kim, and Young-Ki Paik<br />

Summary<br />

Human plasma is regarded the most complex and well-known clinical specimen that<br />

can be easily obtained; alterations in the levels of plasma proteins or their corresponding<br />

enzyme activities may reflect either a healthy or a diseased state. Given that there is<br />

no defined genomic information as to the intact protein components in plasma, protein<br />

profiling could be the first step toward its molecular characterization. Several problems<br />

exist in the analysis of plasma proteins, however. For example, the widest dynamic range<br />

of protein concentrations, the presence of high-abundance proteins, and post-translational<br />

modifications need to be considered before proteomic studies are undertaken. In particular,<br />

efficient depletion or pre-fractionation of high-abundance proteins is crucial for the identification<br />

of low-abundance proteins that may contain potential biomarkers. After the removal<br />

of high-abundance proteins, protein profiling can be initiated using two-dimensional<br />

electrophoresis (2DE), which has been widely used for displaying the differential proteome<br />

under specific physiological conditions. Here, we describe a typical 2DE procedure for<br />

plasma proteome under either a healthy or a diseased state (e.g., liver cancer) in which<br />

pre-fractionation and depletion are integral steps in the search for disease biomarkers.<br />

Key Words: 2-dimensional gel electrophoresis; plasma; HPPP; immunoaffinity<br />

column.<br />

Abbreviations: IEF: Isoelectric Focusing, IPG; Immobilized pH Gradient, TCA:<br />

Trichloroacetic Acid, FFE: Free Flow Electrophoresis, HPMC: Hydroxypropyl Methylcellulose,<br />

TBP: Tributylphosphine, 2DE: 2-dimensional Gel Electrophoresis, BPB:<br />

Bromophenol Blue, CHCA: -cyano-4-hydroxycinnamic acid, LTQ: Linear Iontrap<br />

From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />

Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />

57


58 Cho et al.<br />

MALDI-TOF: Matrix-assisted Laser Desorption Ionization - Time of Flight Mass<br />

Spectrometry, HPPP: Human Plasma Proteome Project.<br />

1. Introduction<br />

Human plasma is an intravascular fluid that serves as a liquid medium<br />

for blood proteins that are derived from various cells, tissues, and other<br />

biofluids (1). In fact, the components of plasma are very heterogeneous,<br />

including inorganic ions (e.g., bicarbonate, calcium), metabolic intermediates<br />

(e.g., cholesterol, glucose), and plasma proteins (e.g., albumin, globulin), which<br />

are important in maintaining body fluid balance, immune response, blood<br />

clotting, and other metabolic mechanisms of homeostasis. Plasma contains<br />

many different proteins that are primarily synthesized in the liver and are often<br />

subjected to post-translational modification (PTM) (2).<br />

Since human plasma is the most complex and well-known clinical specimen<br />

that can be easily obtained, it has been a central target for many biomedical<br />

studies (2). Alterations in the levels of plasma proteins or their corresponding<br />

enzyme activities may reflect either a healthy or a diseased state that can<br />

be monitored by various analytical tools, including biochemical assays and<br />

proteomics. Given that there is no defined genomic information as to the<br />

intact protein components in plasma, a proteomic study may be the method of<br />

choice (3,4). Recently, plasma protein profiling was conducted as part of the<br />

plasma proteome project of HUPO, termed HPPP (5). The pilot phase of HPPP<br />

produced 3020 non-redundant proteins that were found to be present in human<br />

plasma and serum (5,6).<br />

However, several points must be addressed before proteomic studies are<br />

undertaken. First, plasma protein is believed to contain the most dynamic<br />

concentration range (more than 10 orders of magnitude) of each constituent<br />

protein, creating many technical obstacles in proteomic detection by mass<br />

spectrometry (MS) (2,3). For example, the removal of high-abundance proteins<br />

(e.g., albumin, IgG, transferrin, fibrinogen, IgA, etc.) that occupy more than<br />

90% of all plasma proteins prior to biochemical analysis may be a big<br />

challenge and perhaps even problematic in light of plasma-derived biomarker<br />

discovery (3,7). Second, since many plasma proteins have many structural<br />

isoforms, more efficient analytical system is needed to facilitate the analysis<br />

of multiple isoforms of plasma proteins (1). Third, since many plasma proteins<br />

are synthesized as pre-proteins that are subjected to various PTMs for cellular<br />

function, more efficient methods to analyze modified proteins (e.g., glycosylated<br />

proteins) are required. For example, since glycopeptides are not easily<br />

ionized completely during MS analysis, which leads to inadequate spectral<br />

data and low detection sensitivity due to the attached glycans, a strategy


Protein Profiling by Two-Dimensional Electrophoresis 59<br />

for the removal of glycans must be considered for protein identification.<br />

Taken together, all these factors are important for the proteomic study of<br />

plasma (8).<br />

Of the problems listed above, the first problem that concerns the protein<br />

profiling of plasma may be the depletion or pre-fractionation of high-abundance<br />

plasma proteins (3,4,7). Without this depletion procedure, the identification of<br />

low-abundance proteins (including biomarkers) may not be practical. After the<br />

removal of high-abundance proteins, two-dimensional electrophoresis (2DE)<br />

may be the first step chosen to analyze plasma proteins because it is easy to<br />

perform in the laboratory. Although 2DE has several limitations in terms of<br />

reproducibility, separation of membrane or low-molecular-weight proteins, and<br />

proteins with extreme pIs (10), this technique has been widely used<br />

as a first analysis of proteins in a particular physiological state when coupled<br />

with MS (9). Recently, quantitative 2DE was performed with a difference in<br />

gel electrophoresis (DIGE) system (see Chapter by Friedman and Lilley for<br />

detail), where two or three differentially staining dyes can be applied to specific<br />

protein populations to determine their quantitative changes in expression levels<br />

under a specific physiological condition (10). Thus, this chapter is intended<br />

to provide the reader with necessary information on the systematic analysis<br />

of the plasma proteome using 2DE in an attempt to search for disease<br />

biomarkers from the plasma proteins of patients with hepatocellular carcinoma<br />

(HCC) (11,12).<br />

2. Materials<br />

2.1. Preparation of Human Plasma Samples<br />

1. Blood collection tubes: BD Plus Plastic K 2 EDTA (BD, 367525; 10 mL), BD<br />

Glass Serum with silica clot activator (367820, 10 mL).<br />

2. Protease inhibitor (Complete Protease Inhibitor Cocktail, Roche, 11 697 498 001,<br />

20 tablets): One tablet contains protease inhibitors (antipain, bestatin, chymostatin,<br />

leupeptin, pepstatin, aprotinin, phosphoramidon, and EDTA) sufficient for the<br />

processing of 100 mL plasma samples. Prepare 25× stock solutions in 2 mL<br />

distilled water.<br />

2.2. Depletion of High-Abundance Proteins with an Immunoaffinity<br />

Column<br />

1. HPLC system, such as the HP1100 LC system (Agilent).<br />

2. Multiple affinity removal system (MARS): LC column (Agilent, 5185-5984);<br />

Buffer A for sample loading, washing, and equilibrating (Agilent, 5185-5987);<br />

Buffer B for eluting (Agilent, 5185-5988).


60 Cho et al.<br />

2.3. Isoelectric Focusing (IEF) with Immobilized pH Gradient (IPG)<br />

Strip<br />

1. MultiPhor TM (GE Healthcare) or Protean IEF cell (Bio-Rad): Numerous commercially<br />

available isoelectric focusing units exist<br />

2. Re-swelling tray<br />

3. Mineral oil: Immobiline Dry Strip Cover Fluid (GE Healthcare)<br />

4. Power supply, such as the EPS 3501 XL power supply (GE Healthcare)<br />

5. Thermostatic circulator: Multitemp III thermostatic circulator (GE Healthcare)<br />

6. IPG strip: Immobiline Dry Strip, pH 3-10 nonlinear (NL), or pH 4.0-5.0, and pH<br />

5.5-6.7, 18 cm long, 0.5 mm thick (GE Healthcare) or with the same pH ranges<br />

for ReadyStrip IPG strip (Bio-Rad)<br />

7. Carrier ampholyte mixtures: IPG buffer or Pharmalyte, same range as the selected<br />

IPG strip<br />

8. Sample buffer: 7 M urea, 2 M thiourea, 4% (w/v) CHAPS, 0.5% (v/v) ampholyte,<br />

100 mM DTT, 40 mM Tris-HCl, pH 7.5, a trace amount of bromophenol blue<br />

(BPB)<br />

2.4. Microscale Solution Isoelectric Focusing: ZOOM ®<br />

1. ZOOM ® (IEF Fractionator (Invitrogen, ZF10001)).<br />

2. ZOOM ® disks: pHs 3.0, 4.6, 5.4, 6.2, 7.0, and 10.0 [Invitrogen, ZD series (e.g.,<br />

ZD10030 for pH 3.0)]<br />

3. IEF Anode Buffer (50X) (Novex, LC5300, 100 mL)<br />

4. IEF Cathode Buffer (10X) (Novex, LC5310, 125 mL)<br />

5. Anode buffer: 8.4 g urea, 3.0 g thiourea, 3.3 mL Novex ® IEF Anode Buffer<br />

(50X). Add water to a final volume of 20 mL.<br />

6. Cathode buffer: 8.4 g urea, 3.0 g thiourea, 3.3 mL Novex ® IEF Cathode Buffer<br />

(50X). Add water to a final volume of 20 mL.<br />

2.5. Fractionation of Plasma Samples by Free Flow Electrophoresis<br />

(FFE)<br />

1. ProTeam TM FFE instrument (Tecan)<br />

2. 1% 2-(4-sulfophenylazo)-1,8-dihydroxy-3,6-naphthalenedisulfonic acid (SPAD-<br />

NS) (Tecan, 517074)<br />

3. 0.8% hydroxypropyl methylcellulose (HPMC) (Tecan, 5170709)<br />

4. pI markers: mixture of pI markers that indicate pHs 4.2, 5.1, 6.3, 7.4, 8.7, and<br />

10.1 (Tecan, 5170705)<br />

5. Prolyte TM 1, Prolyte TM 2, and Prolyte TM 3 (Tecan, 0309081, 0309102, and<br />

0309093)<br />

6. Anodic stabilization medium (Inlet I 1 ): 14.5% (w/w) glycerol, 8 M urea, 0.03%<br />

(w/w) HPMC, 100 mM H 2 SO 4<br />

7. Separation medium 1 (Inlet I 2 ): 14.5% (w/w) glycerol, 8 M urea, 0.03% (w/w)<br />

HPMC, 14.5% (w/w) Prolyte TM 1


Protein Profiling by Two-Dimensional Electrophoresis 61<br />

8. Separation medium 2 (Inlet I 3−5 ): 14.5% (w/w) glycerol, 8 M urea, 0.03% (w/w)<br />

HPMC, 14.5% (w/w) Prolyte TM 2<br />

9. Separation medium 3 (Inlet I 6 ): 14.5% (w/w) glycerol, 8 M urea, 0.03% (w/w)<br />

HPMC, 14.5% (w/w) Prolyte TM 3<br />

10. Cathodic stabilization medium (Inlet I 7 ): 14.5% (w/w) glycerol, 8 M urea, 0.03%<br />

(w/w) HPMC, 100 mM NaOH<br />

11. Counter flow medium (Inlet I 8 ): 14.5% (w/w) glycerol, 8 M urea<br />

12. Anodic circuit electrolyte: 100 mM H 2 SO 4<br />

13. Cathodic circuit electrolyte: 100 mM NaOH<br />

2.6. Preparation of 2D Gels<br />

1. Gradient former: One of the two Bio-Rad models can be used in this step: Model<br />

385 (30-100 mL capacity) or Model 395 (100-750 mL capacity).<br />

2. Orbital shaker with speed controller.<br />

3. SDS-PAGE: Protean II xi multicell and multicasting chamber (Bio-Rad) or Ettan<br />

DALT twelve large vertical system (GE Healthcare).<br />

4. 5× Tris-HCl buffer: Dissolve 227 g Tris into 800 mL distilled water and adjust<br />

the buffer to pH 8.8 with HCl (∼30 mL). Add distilled water to a final volume<br />

of1L.<br />

5. 5× Gel buffer: Dissolve 15 g Tris, 72 g glycine, and 5 g sodium dodesyl sulfate<br />

(SDS) into 800 mL distilled water and add distilled water to a final volume<br />

of1L.<br />

6. SDS Equilibration buffer contains 6 M urea, 2% (w/v) SDS, 5× gel buffer (pH<br />

8.8), 50% (v/v) glycerol, and 2.5% (w/v) acrylamide monomer.<br />

7. Acrylamide stock solution: Acrylamide/Bis-acrylamide 37:5.1, 40% (w/v)<br />

solution (Amresco, M157, 500 mL).<br />

8. Fixing solution: 40% (v/v) methanol and 5% (v/v) phosphoric acid in distilled<br />

water.<br />

9. Coomassie blue G-250 staining solution: 17% (w/v) ammonium sulfate, 3% (v/v)<br />

phosphoric acid, 34% (v/v) methanol, and 0.1% (w/v) Coomassie blue G-250 in<br />

distilled water.<br />

2.7. 2D Gel Image Analysis<br />

1. Scanner with transparency unit, such as Bio-Rad GS710 or GS800<br />

2. 2D gel image analysis program: Image Master Platinum 5 (GE Healthcare),<br />

PDQuest 7.3.0 (Bio-Rad), or Progenesis Discovery (NonLinear Dynamics, Ltd.)<br />

2.8. Destaining, In-gel Deglycosylation, and In-gel Tryptic Digestion<br />

1. Speed Vac (Heto)<br />

2. PNGase F stock solution for in-gel deglycosylation PNGase F (Glyko, Inc, GKE-<br />

5010). Dilute 1 μL PNGase F (2 mU) with 2.5 mL 1× N-glycanase incubation<br />

buffer (20 mM sodium phosphate, pH 7.5, and 0.02% (w/v) sodium azide)


62 Cho et al.<br />

3. Sequencing-grade modified trypsin (Promega, V5111, 100 μg, 18,100 U/mg)<br />

4. 50 mM ammonium bicarbonate<br />

2.9. Desalting of Peptides and MALDI Plating<br />

1. GELoader tips (Eppendorf, No. 0030 048.083, 20 μL capacity)<br />

2. Poros 10 R2 resin (PerSeptive Biosystems, 1-1118-02, 0.8 g)<br />

3. Oligo R3 resins (PerSeptive Biosystems, 1-1339-03, 6.3 g)<br />

4. 2% (v/v) formic acid in 70% (v/v) acetonitrile (ACN)<br />

5. 0.1% (v/v) trifluoroacetic acid in 70% (v/v) ACN<br />

6. 1-mL syringe<br />

7. Matrix: -cyano-4-hydroxycinnamic acid (CHCA)<br />

8. Opti-TOF TM 384-well insert (123 × 81 mm, 1016491, Applied Biosystems)<br />

2.10. MALDI-TOF and Peptide Mass Fingerprinting<br />

1. MALDI-TOF and MALDI-TOF/TOF: Voyager DE-Pro and 4800 MALDI<br />

TOF/TOF TM Analyzer (Applied Biosystems) equipped with a 355-nm Nd:YAG<br />

laser. The pressure in the TOF analyzer is approximately 7.6e-07 Torr.<br />

3. Methods<br />

3.1. Human Plasma Sample Preparation<br />

The following protocol is conducted according to the HUPO reference<br />

sample collection protocol (13).<br />

1. Each sample pool consisted of 400 mL blood from one healthy, fasting male and<br />

one healthy, fasting postmenopausal female, and was collected into 10-mL tubes<br />

by two venipunctures, 20 tubes per veni-puncture (see Note 1).<br />

2. Equal numbers of tubes and aliquots were generated with appropriate concentrations<br />

of K 2 -EDTA, lithium heparin, or sodium citrate for plasma or were permitted<br />

to clot at room temperature for 30 min to yield serum (with micronized silica as<br />

the clot activator) (see Note 2).<br />

3. The specimens were centrifuged for 10–15 min under refrigerated conditions at<br />

2–6°C.<br />

4. The resultant serum and plasma from 10 spun tubes of the same type from each<br />

donor were pooled into one secondary 50-mL conical bottom BD TM Falcon tube<br />

for each tube type.<br />

5. The secondary tube was centrifuged at 2400×g for 15 min to remove residual<br />

cellular material from serum and to prepare platelet-poor plasma from the EDTA,<br />

heparin, and citrate secondary tubes.<br />

6. Equal volumes of either serum or plasma were pooled from each secondary tube<br />

into media bottles (see Note 3).<br />

7. Serum/plasma was mixed gently and kept on ice while distributed as 20-μL<br />

aliquots into cryovials and was then frozen and stored at –70°C.


Protein Profiling by Two-Dimensional Electrophoresis 63<br />

3.2. Depletion of High-abundance Proteins with an Immunoaffinity<br />

Column<br />

For efficient depletion of high-abundance proteins prior to their molecular<br />

analysis, many reports have indicated that it is convenient to use commercially<br />

available immunoaffinity columns, such as the MARS (Agilent) (2,3) or the<br />

prepacked 2-mL Seppro TM MIXED12 affinity LC column (GenWay Biotech.)<br />

(14), coupled with an HPLC system. For depletion of the six most abundant<br />

proteins (i.e., albumin, transferrin, IgG, IgA, haptoglobin, and anti-trypsin) in<br />

either serum or plasma, we introduced MARS, which has been used successfully<br />

with a wide variety of sample types, including cerebrospinal fluid (CSF) and<br />

follicular fluid (2,3) (see Fig. 1 ).<br />

1. Dilute human serum or plasma fivefold with Buffer A (for example: 20 μL<br />

human plasma with 80 μL Buffer A) containing the protease inhibitor stock<br />

solution (40 μL per 1 mL plasma) (see Note 4) (adopted from the manufacturer’s<br />

instructions).<br />

2. Remove the particulates with a 0.22-μm spin filter for 1 min at 16,000×g.<br />

3. Inject 75-100 μL of the diluted serum or plasma at a flow rate of 0.5 mL/min.<br />

Fig. 1. The 2DE images of total human plasma proteins that were depleted of the<br />

major six abundant proteins through MARS. Proteins were isoelectrically focused with<br />

pH 3–10 NL IPG strips in the first dimension and then resolved by 9–16% SDS-<br />

PAGE in the second dimension. (A) Whole plasma. (B) Flow through from MARS.<br />

Approximately 800 protein spots are displayed by 2DE and identified by MALDI-TOF<br />

mass spectrometry. The names of the major proteins of each gel are marked on the<br />

image (5) (from (4)with permission)


64 Cho et al.<br />

4. Collect the flow-through fractions that appear between 1.5 and 4.5 min and store<br />

them at –20°C if they were not to be analyzed immediately.<br />

5. Elute bound proteins from the column with Buffer B (elution buffer) at a flow<br />

rate of 1 mL/min for 3.5 min.<br />

6. Regenerate the column by equilibrating with Buffer A for an additional 7.4 min<br />

at a flow rate of 1 mL/min.<br />

3.3. TCA/Acetone Precipitation<br />

During 2DE, interfering compounds, such as proteolytic enzymes, salts,<br />

lipids, nucleic acids, and any residual high-abundance proteins present after<br />

depletion, must be removed or inactivated. In the case of plasma samples, the<br />

two most important parameters are salt and proteolysis. TCA/acetone precipitation<br />

is the most useful method for desalting the whole plasma and the<br />

flow-through fractions of MARS.<br />

1. Add 50% (w/v) trichloroacetic acid (TCA, Sigma, T9159) to reach a final TCA<br />

concentration of 5-8%. Mix gently by inverting the tube 5 to 6 times and incubate<br />

on ice for 2 h.<br />

2. Centrifuge the sample at 14,000×g for 15 min and discard the supernatant.<br />

3. Add 200 μL cold acetone and resuspend the protein pellet with a pipette.<br />

4. Incubate on ice for 15 min and centrifuge the sample at 14,000×g for 20 min,<br />

discard the acetone, and dry the pellet in air (see Note 5).<br />

5. Dissolve the pellet in the sample buffer for 2DE and quantify the protein concentration<br />

by the Bradford protein assay.<br />

3.4. Rehydration of the IPG Gel Strip<br />

For analytical purposes, typically 0.3–1.0 mg protein can be loaded onto an<br />

18-cm-long IPG with a wide pH range (e.g., pH 3-10), or 0.5–2.0 mg on an<br />

IPG with a narrow pH range (e.g., pH 5.5–6.7). A narrow-range IPG usually<br />

produces a higher resolution when separate proteins are analyzed by sequential<br />

IEF systems: first, fractionate the proteins over several pI ranges in solution<br />

with ZOOM ® disks or FFE (see Subheadings 3.6 and 3.7) and then perform<br />

IEF with IPG strips [one pH unit range strips are also available (e.g., pH 3.0–<br />

4.0 or pH 3.5–4.5 up to pH 6.7)]. Certain proteins appear to be trapped in the<br />

disk membrane; partitions and sample loss should be considered.<br />

1. Dilute 1.0 mg protein with the sample buffer to a final volume of 400 μL for<br />

18-cm-long IPG strips (see Note 6).<br />

2. Transfer the entire protein-containing sample buffer into the re-swelling tray.<br />

3. Peel off the protective cover from the IPG strip and slowly slide the IPG strip (gel<br />

side down) onto the sample solution. Avoid trapping air bubbles and distribute<br />

the sample solution evenly under the strips.


Protein Profiling by Two-Dimensional Electrophoresis 65<br />

4. Overlay the strip with mineral oil and leave for 12-16 h at room temperature (see<br />

Note 7 for cup loading)<br />

3.5. IEF with IPG Strip<br />

1. Remove the rehydrated IPG strips that are carrying the protein samples and place<br />

them (gel side up) on the strip tray.<br />

2. Place the 2.5-cm filter papers, wetted with distilled water, on both sides of the<br />

strips at both cathodic and anodic ends. Place the strip tray on the IEF unit.<br />

3. Cover the strips entirely with mineral oil.<br />

4. Program the instrument (e.g., Multiphor II): Increase the voltage from 100 to<br />

3500 V to reach 80,000 total voltage hours (Vh) (e.g., sequentially, 300 Vh at<br />

100 V, 600 Vh at 300 V, 600 Vh at 600 V, 1000 Vh at 1000 V, and 2000 Vh at<br />

2000 V, for a total of 80,000 Vh at 3500 V) (see Notes 8 and 9).<br />

5. During IEF, the temperature is set to 20°C with a water circulator.<br />

3.6. Microscale Solution IEF: ZOOM ®<br />

To reduce typical artifacts that may occur when using narrow-range IPG<br />

strips (e.g., streaking, distortion, and loss of protein spots), one may use<br />

MicroSol-IEF (e.g., ZOOM ® , Invitrogen) prior to running 2D gels (3) (see<br />

Fig. 2). MicroSol-IEF is a preparative solution-phase IEF apparatus that<br />

is dissected by a defined pH membrane disc (15,16). Using MicroSol-IEF,<br />

2.5-3.0 mg plasma proteins can be loaded and efficiently fractionated into five<br />

separate chambers by their pI values.<br />

1. Add 2 μL of 99% dimethylamine (DMA) to the 400-μL sample (see Subheading<br />

3.4, Step 2) for alkylation and incubate the sample on a rotary shaker for 30 min<br />

at room temperature (adopted from the manufacturer’s instructions).<br />

2. Add 4 μL of 2 M DTT to quench any excess DMA. Centrifuge at 16,000×g for<br />

20 min at 4°C.<br />

3. Preparation of protein samples: Dilute 3 mg protein to a 3250-μL volume with<br />

sample buffer. The amount of diluted sample per chamber in the ZOOM ® IEF<br />

Fractionator is 650 μL.<br />

4. Assemble the ZOOM ® IEF Fractionator according to the manufacturer’s instructions.<br />

Six disks (pHs 3.0, 4.6, 5.4, 6.2, 7.0, and 10.0) are used to create five<br />

fractions that have a range of pH 3.0–10.0.<br />

5. Add each buffer (anode or cathode) to the corresponding blank chamber.<br />

6. Remove the sample chamber cap and add 650 μL of protein sample (step 3) to<br />

each chamber.<br />

7. Fractionation can be carried out under the following conditions: 100 V for 20 min,<br />

200 V for 80 min, and 600 V for 80 min (see Note 10). The starting current is<br />

approximately 0.6 mA, which increases to approximately 1.2 mA at the beginning<br />

of the 200-V step, and the ending current is approximately 0.2 mA.<br />

8. Load the electro-focused samples to the narrow pH IPG strips for 2DE.


66 Cho et al.<br />

Fig. 2. Narrow pH range 2DE images of plasma proteins after depletion of the major<br />

six abundant proteins through MARS. After microscale solution IEF (ZOOM ® ), the pH<br />

5.5–6.2 fraction was separated on pH 5.5–6.7 IPG strips by second isoelectric focusing<br />

and then resolved on a 9–16% gel. (A) Whole 2DE image of pH 3–10 NL and pH<br />

5.5–6.7. (B) One spot on the pH 3–10 NL gel can be separated into two or more spots<br />

in the narrow pH range 2DE. (C) Many hidden spots on the pH 3–10 NL gel appear<br />

in the narrow pH range 2DE of normal and HCC plasma.


Protein Profiling by Two-Dimensional Electrophoresis 67<br />

3.7. Fractionation of the Plasma Samples by Free Flow Electrophoresis<br />

To identify and isolate biomarker candidates from the plasma of diseased<br />

patients with HCC using 2DE, a higher resolution is critical, and the analysis<br />

can be done by performing narrow pH range IEF. However, for narrow pH range<br />

IEF, higher amounts of proteins (e.g., 10-fold or higher) should be loaded onto<br />

the IPG strip since the proteins present in other pH ranges will be discarded.<br />

Nevertheless, prefractionation or depletion is required prior to running both<br />

IEF and 2D gel. FFE is useful for prefractionation of plasma samples since it<br />

gives rise to a specific fraction of interest (e.g., pI, or density). For example, if<br />

one knows the pI of certain proteins, free fractionation by FFE can be useful<br />

for prefractionation of complex plasma. We describe here one of the several<br />

procedures for prefractionation of plasma samples using FFE.<br />

1. Dissolve the TCA-precipitated, flow-through fractions of MARS (∼2.0 mg) into<br />

the 500-μL separation medium 3 (see below) (adopted from the manufacturer’s<br />

instructions).<br />

2. Add traces of red acidic dye 2-(4-sulfophenylazo)-1,8-dihydroxy-3,6-<br />

naphthalenedisulfonic acid (SPADNS, Aldrich) to ease the optical control of the<br />

migration of sample within the separation chamber.<br />

3. FFE is carried out at 10°C using the following media (solutions marked<br />

at each inlet are applied): Anodic stabilization medium (Inlet I 1 ), separation<br />

medium 1 (Inlet I 2 ), separation medium 2 (Inlet I 3−−5 ), separation medium 3<br />

(Inlet I 6 ), cathodic stabilization medium (Inlet I 7 ), and counter-flow medium<br />

(Inlet I 8 ).<br />

4. To both the anode and the cathode, anodic circuit electrolyte and cathodic circuit<br />

electrolyte are applied, respectively.<br />

5. Assemble the ProTeam TM FFE instrument (Tecan). Use a 0.4-mm spacer for the<br />

separation chamber and a flow rate of approximately 60 mL/h (Inlet I 1−7 ) and a<br />

voltage of 1500 V, which results in a current of 20–24 mA.<br />

6. Perfuse the separation chamber with the sample using the cathodal inlet at approximately<br />

0.7 mL/h (4,17). Residence time in the separation chamber is approximately<br />

33 min.<br />

7. Collect each fraction into polypropylene, 96 deep-well plates, numbered 1 (anode)<br />

through 44 (cathode) (4).<br />

8. Remove glycerol and HPMC by TCA/acetone precipitation and dissolve the<br />

proteins with sample buffer.<br />

9. Load the electro-focused samples with narrow pH to the IPG strips for 2DE.<br />

3.8. Preparation of 2D Gels<br />

1. Cast the glass plates (separated by two 1.5-mm spacers positioned along the sides)<br />

and thin plastic sheets in the multi-casting chamber (20).<br />

2. Prepare gel solution for making 10 gels (20 × 20 cm, 1.5-mm spacer, 9–16%<br />

gradient): heavy solution (66.7 mL of 5× Tris-HCl buffer, 75 mL of a 40%


68 Cho et al.<br />

acrylamide stock solution, 0.7 mL of 10% ammonium persulfate (APS), 70 μL<br />

TEMED, and 191.7 mL of 50% glycerol), light solution (66.7 mL of 5× Tris-HCl<br />

buffer, 141.7 mL of a 40% acrylamide stock solution, 0.7 mL of 10% APS, 70 μL<br />

TEMED, and 125 mL distilled water).<br />

3. Assemble the gradient maker and peristaltic pump. Pour the light gel solution into<br />

the mixing chamber (close to the casting chamber) and the heavy gel solution<br />

into the reservoir chamber of the gradient maker. Operate the magnetic stirrer in<br />

the mixing chamber. Turn on the peristaltic pump until the gel solution reaches<br />

0.5-1.0 cm below the end of the glass plates (∼5 min). Check the flow rate, which<br />

should be between 100-120 mL/min.<br />

4. After the gel solution is poured, overlay the gel solution with distilled water to<br />

exclude air and to ensure a level surface on the top of the gel.<br />

5. Allow polymerization to occur overnight at room temperature.<br />

3.9. Equilibration of the Sample and Running of the Gel<br />

To solubilize the electro-focused proteins and to allow SDS to polymerize,<br />

it is necessary to soak the IPG strips in SDS equilibration buffer. This step<br />

is analogous to boiling the sample in SDS buffer prior to SDS-PAGE. The<br />

reducing agents, dithiothreitol (DTT) and tributylphosphine (TBP), reduce<br />

disulfide bonds to sulfhydryls (cysteine residues). Alkylating agents and iodoacetamide<br />

(IAA) prevent reoxidation of the free sulfhydryl groups (21).<br />

1. Prior to use, add approximately 158 μL TBP in 1 mL isopropanol to 100 mL<br />

SDS equilibration buffer and sonicate in a bath-type sonicator until the solution<br />

becomes transparent (see Note 11) (termed TBP equilibration buffer).<br />

2. Add 15 mL TBP equilibration buffer to each strip (gel side up) and gently shake<br />

for 25 min (TBP equilibration) (see Note 12) on an orbital shaker.<br />

3. Briefly rinse the IPG strip with 1× gel buffer and load the IPG strips onto the<br />

top of the gel and pour the agarose embedding solution (molten agarose solution<br />

with trace amounts of BPB) (see Note 13).<br />

4. Perform SDS-PAGE (40 mA/gel) until the BPB dye reaches the bottom of the<br />

gel. Keep the temperature at 10°C. The total run time for 20 × 20 cm gels is<br />

approximately 6 h.<br />

3.10. Coomassie Brilliant Blue G-250 Staining<br />

1. Fix the separated proteins into the gel in a 200-mL fixing solution for 1 h.<br />

2. Decant the fixing solution and stain the gel in Coomassie brilliant blue G-250<br />

overnight.<br />

3. Decant the staining solution.<br />

4. Wash several times (>3 times) in distilled water for more than 4 h.<br />

5. Scan the gel, then wrap the gel in plastic, and store it at 4°C.


Protein Profiling by Two-Dimensional Electrophoresis 69<br />

3.11. 2D Gel Image Analysis<br />

1. Import the gel image (recommended 12–16 bit, tiff format) and convert it into an<br />

ImageMaster file (*.mel).<br />

2. Detect the protein spots and determine the volume and percentage volume of<br />

each spot. The percentage volume is the normalized value that remains relatively<br />

independent of any irrelevant variations between gels, particularly those caused<br />

by varying experimental conditions.<br />

3. Select the differentially displayed protein spots (see Fig. 3).<br />

3.12. Destaining, In-gel Deglycosylation, and In-gel Tryptic Digestion<br />

Most plasma proteins are glycosylated, including clotting factors, lipoproteins,<br />

and antibodies (22,23). These carbohydrate-containing proteins play<br />

major roles in the normal biological functions in plasma. Since glycopeptides<br />

are not easily completely ionized during MS analysis, which may lead to inadequate<br />

spectral data and low detection sensitivity due to the attached glycans, a<br />

strategy for the removal of glycans is necessary for protein identification.<br />

1. Pick (or excise) the protein spot with an end-cut yellow tip and transfer the gel<br />

piece into a 1.5-mL Eppendorf tube.<br />

2. Wash the gel piece with 100 μL distilled water.<br />

3. Add 50 μL of 50 mM NH4HCO3 (pH 7.8) and ACN (6:4), and shake for 10 min.<br />

4. Repeat step 3 until the Coomassie blue G250 dye disappears (2 to 5 times).<br />

5. Decant the supernatant and dry the gel piece in a Speed Vac for 10 min (see<br />

Note 14).<br />

6. Add 5 μL trypsin (12.5 ng/μL in 50 mM NH 4 HCO 3 ) and leave the gel piece on<br />

ice for 45 min.<br />

7. Add 10 μL of 50 mM NH4HCO3 to the gel slice.<br />

8. Incubate the gel piece at 37°C for 12 h.<br />

3.13. Desalting of Peptides and MALDI Plating<br />

1. Resin packing: Twist the column body (GELoader tip, Eppendorf) near the end of<br />

the tip and push the resin solution [Poros R2:Oligo R3 (2:1) in 70% (v/v) ACN,<br />

occasionally in a more efficient ratio of 1:1] with a 1-mL syringe. A packed resin<br />

length of 2-3 mm is suitable (18,19).<br />

2. Equilibration of the column: Add 20 μL of 2% (v/v) formic acid and push the<br />

solution through the column with the 1-mL syringe.<br />

3. Peptide binding: Add the peptide solution (supernatant of step 9 in Subheading<br />

3.12, approximately 10-12 μL) and push this solution through the column with<br />

the syringe.<br />

4. Washing: Add 20 μL of 2% (v/v) formic acid and push this solution through the<br />

column with the syringe.


70 Cho et al.<br />

Fig. 3. Detection of PTMs on the 2DE of plasma proteins. (A) 2DE images of<br />

plasma proteins that were depleted of the major six abundant proteins through MARS,<br />

untreated (left) and alkaline phosphatase (AP)-treated (AP) (right). (B) One of the<br />

differentially displayed proteins after treatment with AP. (C) Data-dependant neutral<br />

loss scan spectrum of sequence KEPCVESLVSpQYFQTVTDYGKD corresponding to<br />

the phosphorylated apolipoprotein A-II precursor.


Protein Profiling by Two-Dimensional Electrophoresis 71<br />

5. MALDI spotting: Add 1 μL matrix solution [10 mg/mL CHCA in 70% (v/v) can<br />

and 2% (v/v) formic acid] and directly spot the eluted peptides and matrix mixture<br />

onto the MALDI plate (Opti-TOF TM 384-well Insert, Applied Biosystems).<br />

6. Reuse the column: Add 20 μL of 100% ACN and push this solution through the<br />

column with the syringe and repeat step 2 for equilibration of the column.<br />

3.14. MALDI-TOF and Peptide Mass Fingerprinting<br />

1. Analyze the peptide mass fingerprinting (PMF) with the Voyager DE-PRO or<br />

4800 MALDI-TOF/TOF mass spectrometer (Applied Biosystems).<br />

2. Obtain the mass spectra in reflectron/delayed extraction mode with an accelerating<br />

voltage of 20 kV and sum data from either 500 laser pulses (4800 MALDI-<br />

TOF/TOF) or 100 laser pulses (Voyager DE-PRO).<br />

3. Calibrate the spectrum with tryptic auto-digested peaks (m/z 842.5090 and<br />

2211.1046) and obtain monoisotopic peptide masses with Data Explorer 3.5<br />

(PerSeptive Biosystems).<br />

4. Search the Swiss-Prot and NCBInr databases with the Matrix Science search<br />

engine (http://www.matrixscience.com).<br />

3.15. Profiling of PTMs on Selected Spots<br />

Although shotgun proteomics that utilize various labeling techniques (e.g.,<br />

SILAC and iTRAQ) are useful for protein identification in a high-throughput<br />

manner, it has many limitations for PTM analysis. However, 2D gels usually<br />

display proteins with PTMs or isoforms of certain proteins on a single gel<br />

as spots in different positions, which can lead to further identification for<br />

their molecular characteristics with the aid of high resolution LC-MS/MS. For<br />

example, in a typical 2D gel of plasma, the phosphorylated forms of certain<br />

protein can be easily detected in a ladder form that results from different<br />

pIs. Figure 3 shows the localization of the exact site of phosphorylated<br />

apolipoprotein A-II precursor. As seen in the figure, there is clear difference<br />

between spots that are alkaline phosphatase (AP)-treated and those that are<br />

untreated in the 2D gel where the treated group has been shifted to a more<br />

basic position. The phosphorylation site of these proteins can be determined<br />

using multidimensional MS (MS 2 and MS 3 ). Here, we describe the procedure<br />

for identification of phosphorylated proteins by 2DE coupled to MS.<br />

1. Desalting is processed for the MARS-treated (high-abundance proteins depleted)<br />

plasma sample using Amicon Ultra-15 (Molecular Weight Cut Off; 5 kDa,<br />

Millipore).<br />

2. Dephosphorylation is carried out overnight at 37°C in a solution of 0.4%<br />

ammonium carbonate buffer (pH 8.5) with 24 ng/μL calf intestine AP in 0.4%<br />

NH4HCO3.<br />

3. The reaction is stopped by freeze drying for further analysis.


72 Cho et al.<br />

4. Execute 2DE, picking, extraction, and desalting of peptides under the same<br />

conditions (see Subheadings 3.8-3.13).<br />

5. Dissolve the extracted and desalted peptides in 10 μL of LC-MS/MS<br />

solution [0.4% (v/v) acetic acid and 0.005% (v/v) heptafluorobutyric acid<br />

(HFBA)].<br />

6. Nano LC-MS/MS analysis is then performed on an Agilent Nano HPLC system<br />

(Agilent) and LTQ mass spectrometer (Thermo Electron, San Jose, CA).<br />

7. The capillary column used for LC-MS/MS analysis (150 mm × 0.075 mm)<br />

was obtained from Proxeon (Odense M, Denmark), and the slurry was packed<br />

in-house with a 5-μm, 100-Å pore size Magic C18 stationary phase (Michrom<br />

Bioresources, Auburn, CA).<br />

8. The mobile phase A for LC separation was 0.4% acetic acid and 0.005% HFBA<br />

in deionized water (Cascada , Pall, USA), and the mobile phase B was 0.4%<br />

acetic acid and 0.005% HFBA in ACN.<br />

9. The sample obtained from the Oasis HLB (Waters, USA) desalting step and<br />

Nanosep (Pall, USA) filtering was loaded onto the LC column.<br />

10. The chromatography gradient was designed to provide a linear increase from<br />

5% B to 35% B over 50 min and from 40% B to 60% B over 20 min and from<br />

60% B to 80% B over 5 min. The flow rate was maintained at 300 nL/min.<br />

11. The mass spectra were acquired using data-dependent acquisition with a full mass<br />

scan (400-1800 m/z) followed by MS/MS scans. Each MS/MS scan acquired<br />

was an average of three microscans on LTQ.<br />

12. The temperature of the ion transfer tube was controlled at 200°C, and the spray<br />

was 2.0–3.0 kV. The normalized collision energy was set at 35% for MS2.<br />

13. To determine the exact position of the phosphorylation site, the automated<br />

neutral loss MS3 scan was employed, which relies on the observed behavior<br />

of phosphopeptides subjected to MS/MS analysis in an ion trap. If the MS/MS<br />

scan produces a fragment phosphate group (98 with charge state 1+, 49 with<br />

charge state 2+, and 32.6 with charge state 3+), an MS3 scan of the product ion<br />

is initiated (see Note 15).<br />

4. Notes<br />

1. Donors were tested and determined negative for HIV-1 and HIV-2 antibodies,<br />

HIV-1 antigen (HIV-1), Hepatitis B surface antigen (HBsAg), Hepatitis B core<br />

antigen (anti-HBc), Hepatitis C virus (anti-HCV), HTLV-I/II antibody (anti-<br />

HTLV-I/II), and syphilis.<br />

2. No protease inhibitor cocktails were used. This procedure required 2hat2-6°C.<br />

3. Approximately 10% of the sample was left at the bottom of the secondary tube<br />

to ensure that no cellular material was collected.<br />

4. If excess of protease inhibitors are used, the resolving power of protein spots in<br />

the 2D gel will be decreased, and the border of the spots will be unclear.<br />

5. If protein pellets are dried completely in the Speed Vac, they will be not redissolved<br />

in sample buffer. Pellets should be air dried for 15–30 min.


Protein Profiling by Two-Dimensional Electrophoresis 73<br />

6. To ensure complete dissolution of the sample buffer, it is usually recommended<br />

to warm the sample buffer at room temperature. The sample buffer that includes<br />

proteins should not be heated to avoid carbamylation of proteins by isocyanate,<br />

which may lead to charge heterogeneities that are formed from the decomposition<br />

of urea.<br />

7. Cup loading: Rehydrate the IPG gel strip with 350 μL sample buffer (proteins<br />

are not included), and load the 100-μL protein sample in sample buffer in the<br />

sample cup. High salt concentrations are better tolerated by cup loading.<br />

8. Apply low voltages (100 V) at the beginning of the run for 3–5 h. Replace the<br />

filter paper (for desalting purposes) at the end of the run.<br />

9. After 1D (first dimension) is run, IPG strips that were not immediately used for<br />

2D (second dimension) run can be preserved at –80°C for several months.<br />

10. If electrical current passes through the system, BPB dye starts to migrate toward<br />

the anode reservoir, which eventually results in a change in the color of the<br />

anode buffer (to yellow).<br />

11. Concentrated TBP reacts violently with organic matter. All procedures for<br />

preparing TBP stock solutions should be done in a fume hood. Store the TBP<br />

stock solution in the dark at 4°C. Do not store it longer than 2 weeks.<br />

12. DTT/IAA equilibration procedure: For reduction and alkylation of proteins,<br />

the DTT/IAA equilibration procedure is also useful to replace the use of TBP<br />

equilibration procedure. Divide the SDS equilibration buffer into two 50-mL<br />

aliquots. Add 1 g DTT to the first aliquot and 1.25 g IAA to the second aliquot.<br />

Add 10 mL of the DTT equilibration buffer to each strip and place on a shaker<br />

for 10 min. Decant the DTT equilibration buffer and shake with 10 mL of the<br />

IAA equilibration buffer for another 10 min.<br />

13. To prepare the agarose embedding solution, dissolve 1gofagarose in 100 mL<br />

of small gel buffer and melt in a microwave on medium power. For complete<br />

melting of the agarose solution, heat the agarose solution in short intervals with<br />

occasional swirling to mix the solution.<br />

14. In-gel deglycosylation: After destaining, one may remove the glycan groups<br />

of glycoproteins by trypsin digestion for obtaining peptides of highest purity.<br />

Rehydrate gel spots (see Subheading 3.12, step 5) with 10 μL of PNGase F<br />

stock solution (10 μU) and incubate for 3hat37°C. Decant the supernatant<br />

including the glycans. Wash the gel piece with 50 μL 50 mM NH4HCO3 (pH<br />

7.8) and ACN (6:4). Dry the gel piece in a Speed Vac.<br />

15. The SEQUEST software was used to identify the peptide sequences:<br />

DeltaCn ≥ 0.1 and Rsp ≤ 4; Xcorr ≥ 1.9 with charge state 1+, Xcorr ≥ 2.2 with<br />

charge state 2+, and Xcorr ≥ 3.75 with charge state 3+ were used as cutoffs for<br />

peptide identification.<br />

Acknowledgments<br />

This study was supported by a grant from the Korean Health 21 R&D project,<br />

Ministry of Health & Welfare, Republic of Korea (A030003 to YKP).


74 Cho et al.<br />

References<br />

1. Putnam, F. W. (ed) (1987) The Plasma Proteins, Academic Press, New York.<br />

2. Anderson, N. L., and Anderson, N. G. (2002) The human plasma proteome: history,<br />

character, and diagnostic prospects. Mol. Cell. Proteomics 1, 845–867.<br />

3. Lee, H. J., Lee, E. Y., Kwon, M. S., and Paik, Y. K. (2006) Biomarker discovery<br />

from the plasma proteome using multidimensional fractionation proteomics. Curr.<br />

Opin. Chem. Biol. 10, 42–49.<br />

4. Cho, S. Y., Lee, E. Y., Lee, J. S., Kim, H. Y., Park, J. M., Kwon, M. S., Park, Y. K.,<br />

Lee, H. J., Kang, M. J., Kim, J. Y., Yoo, J. S., Park, S. J., Cho, J. W., Kim, H. S., and<br />

Paik, Y. K. (2005) Efficient prefractionation of low-abundance proteins in human<br />

plasma and construction of a two-dimensional map. Proteomics 5, 3386–396.<br />

5. Omenn, G. S., States, D. J., Adamski, M., and Blackwell, T. W. (2005). Overview<br />

of the HUPO Plasma Proteome Project: results from the pilot phase with 35<br />

collaborating laboratories and multiple analytical groups, generating a core dataset<br />

of 3020 proteins and a publicly-navailable database. Proteomics 5, 3226–3245.<br />

6. States, D. J., Omenn, G. S., Blackwell, T. W., Fermin, D., Eng, J., Speicher, D. W.,<br />

and Hanash, S. M. (2006) Challenges in deriving high-confidence protein identifications<br />

from data gathered by a HUPO plasma proteome collaborative study. Nat.<br />

Biotechnol. 24, 333–338.<br />

7. Yang, Z., Hancock, W. S., Chew, T. R., and Bonilla, L. (2005) A study of<br />

glycoproteins in human serum and plasma reference standards (HUPO) using<br />

multilectin affinity chromatography coupled with RPLC-MS/MS. Proteomics 5,<br />

3353–3366.<br />

8. Wang, Y., Wu, S. L., and Hancock, W. S. (2006) Approaches to the study of<br />

N-linked glycoproteins in human plasma using lectin affinity chromatography<br />

and nano-HPLC coupled to electrospray linear ion trap-Fourier transform mass<br />

spectrometry. Glycobiology 16, 514–523.<br />

9. Gorg, A., Boguth, G., Kopf, A., Reil, G., Parlar, H., and Weiss, W. (2002) Sample<br />

prefractionation with Sephadex isoelectric focusing prior to narrow pH range twodimensional<br />

gels. Proteomics 2, 1652–1657.<br />

10. Wu, T. L. (2006) Two-dimensional difference gel electrophoresis. Methods Mol.<br />

Biol. 328, 71–95.<br />

11. Park, K. S., Kim, H., Kim, N. G., Cho, S. Y., Choi, K. H., Seong, J. K., and<br />

Paik, Y. K. (2002) Proteomic analysis and molecular characterization of tissue<br />

ferritin light chain in hepatocellular carcinoma. Hepatology 6, 1459–1466.<br />

12. Park, K. S., Cho, S. Y., Kim, H., and Paik, Y. K. (2002) Proteomic alterations of the<br />

variants of human aldehyde dehydrogenase isozymes correlate with hepatocellular<br />

carcinoma. Int. J. Cancer 2, 261–265.<br />

13. Rai, A. J., Glefand, C. A., Haywood, B. C., Warunek, D. J., Yi, J., Schuchard, M. D.,<br />

Mehigh, R. J., Cockrill, S. L., Scott, G. B., Tammen, H., Schulz-Knappe, P.,<br />

Speicher, D. W., Vitzthum, F., Haab, B. B., Siest, G., and Chan, D. W.<br />

(2005) HUPO plasma proteome project specimen collection and handling: towards<br />

the standardization of parameters for plasma proteome samples. Proteomics 5,<br />

3262–3277.


Protein Profiling by Two-Dimensional Electrophoresis 75<br />

14. Huang, L., Harvie, G., Feitelson, J. S., Gramatikoff, K., Herold, D. A., Allen, D. L.,<br />

Amunngama, R., Hagler, R. A., Pisano, M. R., Zhang, W. W., and Fang, X. (2005)<br />

Immunoaffinity separation of plasma proteins by IgY microbeads: meeting the<br />

needs of proteomic sample preparation and analysis. Proteomics 5, 3314–3328.<br />

15. Herbert, B. and Righetti, P. G. (2000) A turning point in proteome analysis: sample<br />

prefractionation via multicompartment electrolyzers with isoelectric membranes.<br />

Electrophoresis 21, 3639–3648.<br />

16. Miklos, G. L. and Maleszka, R. (2001) Integrating molecular medicine with<br />

functional proteomics: realities and expectations. Proteomics 1, 30–41.<br />

17. Weber, G., Islinger, M., Weber, P., Eckerskorn, C., and Volkl, A. (2004)<br />

Efficient separation and analysis of peroxisomal membrane proteins using free-flow<br />

isoelectric focusing. Electrophoresis 25, 1735–1747.<br />

18. Choi, B. K., Cho, Y. M., Bae, S. H., Zoubaulis, C. C., and Paik, Y. K. (2003)<br />

Single-step perfusion chromatography with a throughput potential for enhanced<br />

peptide detection by matrix-assisted laser desorption/ionization-mass spectrometry.<br />

Proteomics 3, 1955–1961.<br />

19. Gobom, J., Nordhoff, E., Mirgorodskaya, E., Ekman, R., and Roepstorff, P. (1999)<br />

A sample purification and preparation technique based on nano-scale RP-columns<br />

for the sensitive analysis of complex peptide mixtures by MALDI-MS. J. Mass<br />

Spectrom. 24, 105–116.<br />

20. Walsh, B. J., and Herbert, B. R. (1999) Casting and running vertical slap-gel<br />

electrophoresis for 2D-PAGE. Methods Mol. Biol. 112, 245–253.<br />

21. Newhall, W. J. and Jones, R. B. (1983) Disulfide-linked oligomers of the major<br />

outer membrane protein of chlamydiae. J. Bacteriol. 154, 998–1001.<br />

22. Kaufman, R. J. (1998) Post-translational modifications required for coagulation<br />

factor secretion and function. Thromb. Haemost. 79, 1068–1079.<br />

23. Tabas, I. (1999) Nonoxidative modifications of lipoproteins in atherogenesis. Annu.<br />

Rev. Nutr. 19, 123–139.


II<br />

Clinical Proteomics by 2DE and Direct<br />

MALDI/SELDI MS Profiling


5<br />

Analysis of Laser Capture Microdissected Cells<br />

by 2-Dimensional Gel Electrophoresis<br />

Daohai Zhang and Evelyn Siew-Chuan Koay<br />

Summary<br />

Laser capture microdissection (LCM) is a powerful tool for procuring near-pure<br />

populations of targeted cell types from specific microscopic regions of tissue sections,<br />

by overcoming problems due to tissue heterogeneity and minimizing intermixture and<br />

contamination by other cell types. The combination of LCM with various proteomic<br />

technologies has enabled high-throughput molecular analysis of human tumors, and<br />

provided critical tools in the search for novel disease markers and therapeutic targets. As<br />

an example, we describe the application of LCM in dissecting the tumor cells in breast<br />

cancer for macromolecular extraction and subsequent protein separation by 2-dimensional<br />

gel electrophoresis (2-D GE). The protocols and the key issues involved in preparing<br />

ethanol-fixed paraffin-embedded tissue blocks and microscopic sections, microdissecting<br />

the cells of interest using the PixCell II LCM system, extracting and separating the cellular<br />

proteins by 2-D GE, and preparing selective proteins for peptide mass analysis by mass<br />

spectrometry, are discussed. The aim is to provide a practical guide in performing highthroughput<br />

microdissection of target cells and gel-based proteomics, which can be adapted<br />

to research in cancer formation and growth.<br />

Key Words: laser capture microdissection; 2-dimensional gel electrophoresis; breast<br />

cancer; proteomics; silver staining.<br />

1. Introduction<br />

Cellular proteins (collectively known as “proteomes”) are less susceptible<br />

than the transcriptome to experimental artifacts arising from the rigors of tissue<br />

collection and processing, and advances in global protein expression analysis<br />

From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />

Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />

77


78 Zhang and Koay<br />

(expression proteomics) have been used in mapping cellular pathways, identifying<br />

the molecular alterations associated with disease onset and progression<br />

and searching for potential tumor markers or drug targets in human disease,<br />

especially in cancer. However, to obtain cell-specific protein profiles, homogeneous<br />

or near-pure populations of the cells of interest, free from contamination<br />

by adjacent cell types, are prerequisites. Laser capture microdissection (LCM)<br />

was developed to enable the procurement of near-pure populations of the target<br />

cells with a greater speed and precision than is possible with manual dissection<br />

methods. LCM permits selective transfer of specific cell types, under direct<br />

microscopic visualization, from complex tissues onto a polymer film that is<br />

activated by laser pulses, whilst retaining their morphology. The homogeneity<br />

of encapsulated cells can be verified microscopically. With these inherent<br />

advantages, LCM has become a valuable research tool and has been applied to<br />

cellular and molecular studies of various cancers, including breast (1,2), colon<br />

(3), and liver (4) cancers. It is equally efficacious in procuring cell populations<br />

from both frozen tissues (3,4) and ethanol-fixed, paraffin-embedded tissues<br />

(1,5).<br />

Protein profiles of the LCM-dissected cells can be obtained by twodimensional<br />

fluorescence difference gel electrophoresis (2-D DIGE) (6),<br />

16<br />

O/ 18 O isotopic labeling (7), differential iodine radioisotope detection (2),<br />

isotope-coded affinity tag (iCAT) coupled with two-dimensional tandem mass<br />

spectrometry (2-D LCMS/MS) (8), and mass spectrometry compatible silver<br />

staining (1,9). Protein samples from LCM-dissected cells can also be applied<br />

to reverse-protein arrays to analyze the key cellular signaling pathways and<br />

metabolic networks (10,11). In this chapter, the in-house protocols used in<br />

the authors’ laboratory for procuring near-pure populations of breast tumor<br />

cells from clinical samples, and for the extraction, isolation, and analysis of<br />

their protein profiles, are described. These include: (1) preparation of ethanolfixed<br />

paraffin-embedded tissue blocks; (2) microdissection using the Pix II<br />

LCM System and cellular protein extraction; (3) protein separation by 2-D gel<br />

electrophoresis (2-D GE), silver staining, and gel image analysis; and (4) preparation<br />

of targeted proteins of interest for peptide mass analysis by tandem mass<br />

spectrometry and identification of proteins of interest via database search.<br />

2. Materials<br />

2.1. Histology—Tissue Block and Tissue Section Preparation<br />

1. 70% (v/v), 80% (v/v), 95% (v/v), 100% ethanol<br />

2. Deionized or Milli-Q water (Millipore, Bedford, MA, USA)<br />

3. Hematoxylin solution, Mayer’s (Sigma, St. Louis, MO, USA)<br />

4. Eosin Y solution (Sigma)


Combining LCM with 2-D Gel Electrophoresis 79<br />

5. Complete, mini protease inhibitor cocktail tablets (Roche Applied Science,<br />

Pleasanton, CA, USA)<br />

6. Disposable microtome blades (Feather Safety Razor Co., Ltd., Osaka, Japan)<br />

7. Uncharged microscopic glass slides (Paul Marienfeld GmbH & Co, KG, Lauda-<br />

Koenigshofen, Germany)<br />

8. Sakura Tissue-Tek ® V.I.P. TM 5 Jr tissue processor (Sakura Finetek, Inc. Japan<br />

Co., Ltd, Tokyo)<br />

9. Paraffin wax—Paraplast ® tissue embedding medium; melting point 56-58°C,<br />

store at room temperature (RT) (Structure Probe, Inc., West Chester, PA, USA)<br />

10. Xylenes, Reagent Grade (Sigma)<br />

11. Embedding molds—super metal base molds, 66mm × 54mm × 15mm (Surgipath<br />

Medical Industries, Richmond, IL, USA)<br />

2.2. Laser Capture Microdissection and Protein Sample Preparation<br />

1. PixCell II LCM system (Arcturus Engineering, Mountain <strong>View</strong>, CA, USA)<br />

2. CapSure transparent plastic caps (Arcturus Engineering)<br />

3. Lysis buffer: 7 M urea, 2 M thiourea, 4% (w/v) CHAPS, 1% Nonidet P (NP)-40,<br />

0.5% (v/v) Triton X-100, 50 mM dithiothreitol (DTT), 40 mM Tris-HCl, pH 7.5,<br />

2 mM tributyl phosphine (TBP), and 1% (v/v) IPG buffer (pH 3–10). Store at RT.<br />

4. PlusOne 2-D Clean-up Kit (GE Healthcare, San Francisco, CA, USA)<br />

5. Immobilized pH gradient (IPG) buffer (pH 3–10) (GE Healthcare)<br />

6. PlusOne 2-D Quantitation Kit (GE Healthcare)<br />

2.3. Isoelectric Focusing (IEF) and Sodium Dodecyl<br />

Sulfate-Polyacrylamide Gel Electrophoresis (SDS-PAGE)<br />

1. Ettan TM IPGphor TM IEF electrophoresis unit (GE Healthcare)<br />

2. Ceramic strip holders and Ettan TM IPGphor TM Strip Holder Cleaning Solution<br />

(GE Healthcare)<br />

3. Immobiline TM IPG DryStrips (18 cm, pH 3–10, NL) (GE Healthcare)<br />

4. DryStrip Cover Fluid (GE Healthcare)<br />

5. Sample rehydration buffer: 7 M urea, 2 M thiourea, 4% (w/v) CHAPS, 1%<br />

(w/v) NP-40, 1% (v/v) IPG buffer, 50 mM DTT. DTT was added freshly to the<br />

rehydration buffer prior to use. Store at RT.<br />

6. Equilibration buffer A (prepare 10 ml for each strip): 6 M urea, 30% glycerol,<br />

2% SDS, 1% DTT, 50 mM Tris-HCl, pH 8.8. DTT is added to the stock solution<br />

before use.<br />

7. Equilibration buffer B (prepare 10 ml for each use strip): 6 M urea, 30% glycerol,<br />

2% SDS, 250 mg (2.5%, w/v) iodoacetamide (IAA), 50 mM Tris-HCl, pH 8.8.<br />

IAA is added to the stock solution before use.<br />

8. 10% SDS-acrylamide gel: 33 ml acrylamide/bis (30% T, 5% C) (Bio-Rad<br />

Laboratories, Hercules, CA, USA), 25 ml Tris (1.5 M, pH 8.8), 1 ml 10% (w/v)<br />

SDS, 0.5 ml 10% (w/v) ammonium persulfate (freshly prepared on the day of<br />

use), 35 μl TEMED (Bio-Rad). Make up to 100 ml with Milli-Q water.


80 Zhang and Koay<br />

9. Water-saturated isobutanol: Shake equal volumes of Milli-Q water and isobutanol<br />

in a glass bottle and allow the mixture to separate. Transfer the top layer<br />

to a new bottle and store at RT.<br />

10. Agarose sealing solution: Dissolve 0.5% low-melting-point agarose and 0.1%<br />

(w/v) bromophenol blue in 1× SDS-PAGE running buffer. Store at RT.<br />

11. SDS-PAGE running buffer: 25 mM Tris, 198 mM glycine, 0.2% (w/v) SDS,<br />

pH 8.3<br />

12. PROTEAN TM II xi Cell system (Bio-Rad)<br />

2.4. Silver Staining (see Note 1)<br />

1. Fix solution: 5% acetic acid and 50% ethanol per 100 ml<br />

2. Sensitivity-enhancing solution: 30% (v/v) ethanol, 6.8% (w/v) sodium acetate,<br />

100 μl of 2% (w/v) sodium thiosulphate per 100 ml<br />

3. Silver staining solution: 0.25% (w/v) silver nitrate<br />

4. Development solution: 2.5% (w/v) anhydrous potassium carbonate, 20 μl of 2%<br />

(w/v) sodium thiosulphate per 100 ml, 40 μl of 37% formaldehyde per 100 ml.<br />

5. Stop solution: 4% (w/v) Tris and 2% (v/v) acetic acid per 100 ml<br />

6. Gel store (soak) solution: 1% (w/v) sodium acetate and 10% (v/v) methanol per<br />

100 ml<br />

2.5. Gel Image Analysis<br />

1. Personal Densitometer SI (Molecular Dynamics, Sunnyvale, CA, USA)<br />

2. ImageMaster 2D Elite (Platinum) software (GE Healthcare)<br />

2.6. In-gel Trypsin Digestion and Preparation for MS Analysis<br />

1. Destaining solution: 30 mM potassium ferricyanide and 100 mM sodium<br />

thiosulfate (1:1)<br />

2. 25 mM sodium bicarbonate<br />

3. Dehydrating solution: 50 mM sodium bicarbonate and 50% (v/v) methanol per<br />

100 ml<br />

4. SpeedVac centrifuge (TeleChem International, Inc., Sunnyvale, CA, USA)<br />

5. Digestion solution: 40 ng/μl trypsin sequencing grade (Promega, Madison, WI,<br />

USA) in 20 mM ammonium bicarbonate solution<br />

6. Extraction solution (for hydrophobic peptides): 5% (v/v) trifluoracetic acid<br />

(TFA) and 50% (v/v) acetonitrile (ACN) per 100 ml<br />

7. Peptide reconstitution solution: 0.1% (v/v) TFA<br />

8. ZipTip C18 columns (Millipore)<br />

9. Eluant: 70% (v/v) ACN and 0.1% TFA per 100 ml<br />

10. Stainless steel MALDI-TOF sample target plates (Applied Biosystems,<br />

Framingham, MA, USA)<br />

11. Alpha-cyano-4-hydroxycinnamic acid (-CHCA) matrix, 3 mg/ml (Sigma)<br />

12. Applied Biosystems 4700 MALDI-TOF/TOF mass spectrometer


Combining LCM with 2-D Gel Electrophoresis 81<br />

2.7. Database Search for Protein Identification<br />

1. MASCOT software (Matrix Science, London, England)<br />

2. MS-Fit software (http://prospector.ucsf.edu)<br />

3. Methods<br />

The methods described below have been successfully used in the authors’<br />

laboratory for proteomics studies in human breast cancer specimens (1,9) and<br />

can be applied to other cancer tissues as well. Breast tumors and matched<br />

normal tissues were obtained from the Tissue Repository Unit of the National<br />

University Hospital, Singapore, after approval by our Institutional Review<br />

Board.<br />

3.1. Preparation of Tissue Sections for LCM<br />

In this step, frozen tissues can be directly transferred from the –80°C freezer,<br />

where they had been stored after surgical excision and trimming, to a pre-cooled<br />

tube containing 70% (v/v) ethanol and kept on ice. Ethanol-fixed paraffinembedded<br />

tissue blocks should be prepared as quickly as possible, and the<br />

completed blocks stored at or below 4°C.<br />

1. Fix the frozen tissue overnight in 70% ethanol at 4°C.<br />

2. Place each ethanol-fixed tissue piece, trimmed to appropriate dimensions, into<br />

a pre-cooled cassette within the tissue processor and dehydrate according to the<br />

following procedure: 30 min each in 70% and 80% ethanol at 40°C; 45 min in<br />

95% ethanol at 40°C (twice); 45 min in 100% ethanol at 40°C (twice), and 45 min<br />

in xylene at 40°C (twice) (see Note 2).<br />

3. Embed the specimen in paraffin using embedding molds, with four changes of<br />

paraffin after every 30-min interval.<br />

4. Store the paraffin blocks at or below 4°C, if they were not to be processed<br />

immediately for sectioning.<br />

5. Put the block in a –20°C freezer for at least 1 h before cutting sections from it.<br />

6. Cut sections of 8 μm thickness using a standard microtome. Blades should be<br />

changed regularly (see Note 3).<br />

7. Collect the tissue sections on uncharged microscopic glass slides, allow tissue<br />

sections to be air dried, and store the cut sections at or below 4°C.<br />

3.2. Staining of Paraffin-embedded Sections<br />

The staining of sections for LCM is similar to that used in most histology<br />

laboratories for morphological assessment. However, using minimal amount of<br />

the stain to visualize the tissue for microdissection will improve macromolecule<br />

recovery (see Note 4). One tablet of protease inhibitor cocktail should be added


82 Zhang and Koay<br />

to every 10 ml of each reagent (except xylene), and all reagents prepared using<br />

double deionized water or Milli-Q ® water. Staining should be performed as<br />

close as possible to the scheduled LCM dissection.<br />

1. Deparaffinize the sections in fresh xylene for 5 min, followed by another 5 min<br />

with a fresh change of xylene.<br />

2. Rehydrate for 15 s in each step of the following series: 100% ethanol, 95%<br />

ethanol, 75% ethanol, and deionized water.<br />

3. Stain with Mayer’s Hematoxylin for 30 s.<br />

4. Rinse off excess stain with deionized water for 15 s; repeat rinse a second time.<br />

5. Dehydrate for 15 s in 70% ethanol.<br />

6. Stain with Eosin Y for 5 s.<br />

7. Dehydrate the sections for 15 s (twice) in 95% ethanol, 15 s (twice) in 100%<br />

ethanol, and 60 s in xylene.<br />

8. Air-dry for approximately 2–5 min to allow xylene to evaporate completely (see<br />

Note 5).<br />

9. The tissue is now ready for LCM (see Note 6).<br />

3.3. Laser Capture Microdissection and Protein Sample Preparation<br />

The PixCell II LCM system (Arcturus Engineering, Mountain <strong>View</strong>, CA,<br />

USA) is used for specific microdissection of tumor cells in our laboratory.<br />

Tissue sections are usually mounted on uncoated glass slides to provide support<br />

for the CapSure cap during microdissection. LCM utilizes an infrared laser<br />

integrated into a standard microscope, and when the desired cells move into<br />

the path of the light source, the investigator activates the laser, which in<br />

turn activates the membrane (a short laser pulse emitted heats the transparent<br />

membrane to ∼90°C for 5 ms). This melts the membrane, with subsequent<br />

binding and encapsulation of the cells of interest, segregating them from the<br />

surrounding cells and connective tissues. Images of the tissues before and after<br />

microdissection and of the captured cells on the cap can be visualized, thus<br />

maintaining an accurate record of each dissection. The laser beam diameter<br />

may be adjusted from 7.5 to 30 μm to procure either single cells or groups of<br />

cells, respectively.<br />

1. Place the slide containing the prepared tissue on the microscope stage. Set the<br />

laser parameters as follows: spot diameter at 15 μm, pulse duration at 5 ms, and<br />

power at 50 mW.<br />

2. Scan the tissue section to locate the desired cells. Dissect out the target cells of<br />

interest and capture all encapsulated cells from each section in quick succession<br />

into one cap. Cells dissected from ∼2500 shots can be captured into one cap (see<br />

Note 7). Figure 1 shows an example of tumor cells before and after microdissection.


Combining LCM with 2-D Gel Electrophoresis 83<br />

A B C<br />

Fig. 1. Laser capture microdissection (LCM) of breast tumor cells. The tissue section<br />

on the uncharged glass slide was stained with hematoxylin and eosin and microdissected<br />

with the PixCell II LCM system (Arcturus Engineering). (A) section before LCM; (B)<br />

section after LCM; (C) microdissected cell.<br />

3. Place the LCM cap on an Eppendorf tube containing 100 μl of lysis buffer with<br />

protease inhibitor and invert the tube and vortex vigorously for 1 min.<br />

4. Place the tube on ice for approximately 20 min and sonicate the microdissected<br />

sample in a bath sonicator with 5 s pulses, in between 5-s intervals, for a duration<br />

of 1 min.<br />

5. Replace the sample on ice immediately after 1-min sonication.<br />

6. Centrifuge the sample at 16,000 g for 20 min at 4°C and transfer the supernatant<br />

to a new Eppendorf tube.<br />

7. Determine the protein concentration using the PlusOne 2D Quantitation kit (GE<br />

Healthcare) and clean up the sample using the PlusOne 2-D cleanup kit (GE<br />

Healthcare), following the manufacturer’s instructions closely.<br />

8. Dissolve the protein pellet in the appropriate volume of sample rehydration buffer<br />

and aliquot according to experimental plans for immediate and later usage. Store<br />

the aliquotted samples at –80°C until analyzed (see Note 8).<br />

3.4. First-dimension Gel Electrophoresis (Isoelectric Focusing)<br />

1. Prepare the strip holder for the 18-cm IPG strip (see Note 9).<br />

2. Squeeze a few drops of Ettan IPGphor Strip Holder Cleaning Solution (GE<br />

Healthcare) into the slot and clean thoroughly. Rinse with Milli-Q water and dry<br />

completely.<br />

3. Mix approximately 50 μl of the reconstituted protein samples (∼100–150 μg)<br />

with the appropriate volume of rehydration buffer. The total volume should be<br />

340 μl for one 18-cm IPG strip.<br />

4. Transfer the entire volume of the diluted protein sample into the groove of the<br />

IPG strip holder.<br />

5. Remove the cover from the IPG strip (18 cm, pH 3–10) and place the IPG strip<br />

in the holder such that the gel of the strip is in contact with the sample (i.e., gel


84 Zhang and Koay<br />

side down). Try to remove any trapped air bubbles by lifting the strip up and<br />

down from one side.<br />

6. Overlay the IPG strip with 2–3 ml of DryStrip Cover Fluid to prevent urea<br />

crystallization and evaporation, and replace the cover on the strip holder.<br />

7. Rehydrate the IPG strip at 20 V for 12 h at 20°C.<br />

8. Perform IEF under the following conditions: 500 V for 1 h, 2000 V for 1 h,<br />

4000 V for 1 h, and 8000 V for 6 h.<br />

9. Once focusing is complete, pour off the oil. The strips can be stored at –20°C for<br />

several weeks, or immediately treated as described below (see Subheading 3.5).<br />

3.5. IPG Strip Equilibration<br />

1. Place the focused IPG strips in a container with 10 ml of equilibration buffer A<br />

and shake for 15 min at RT (see Note 10).<br />

2. Transfer the IPG strip to a container with 10 ml of equilibration buffer B and<br />

shake for 15 min at RT (see Note 10).<br />

3. The equilibrated strips can then be processed for second-dimension gel<br />

electrophoresis.<br />

3.6. Second-dimensional SDS-PAGE<br />

Prepare the SDS-polyacrylamide gels in advance, and make sure that the<br />

gels are well polymerized before performing the equilibration of IPG strips.<br />

The proteins have to be charged by equilibration with SDS, and be reduced<br />

and alkylated to avoid the formation of oligomers. In our laboratory, we use<br />

the PROTEAN II xi Cell system (Bio-Rad) for SDS-PAGE.<br />

1. Assemble the gel casting cassette as per the manufacturer’s instructions.<br />

2. Prepare 10% SDS-PAGE (see Note 10) and pour the solution slowly into the<br />

cassette (two 16 cm × 20 cm glass plates sandwiched by 1.5-mm thick spacers)<br />

until the gel height is approximately 1 cm from the top.<br />

3. Overlay the gel solution with 2 ml of water-saturated isobutanol. It is best to<br />

pour 1 ml of water-saturated isobutanol from one side of the gel and 1 ml on<br />

the other side. Do not pour it all along the gel meniscus.<br />

4. Allow the gel to polymerize for at least 2 h.<br />

5. When polymerization is completed, remove the water-saturated isobutanol and<br />

rinse with water again.<br />

6. With a pair of forceps, carefully place the equilibrated strip on top of the PAGE<br />

gel, with the acidic side of the strip at left. Cover the strip with melted agarose<br />

sealing solution (see Note 11).<br />

7. Assemble the electrophoresis unit (Bio-Rad) and perform electrophoresis at 15°C<br />

as follows: 40 V for 15 min or until the blue dye enters the gel and then raise<br />

the voltage to 125 V and run the gel overnight or until the blue dye migrates to<br />

the bottom of the gel.<br />

8. Switch off the main power and disassemble the gel cassette.


Combining LCM with 2-D Gel Electrophoresis 85<br />

9. Place the gel in a glass container and wash the gel with Milli-Q water.<br />

10. Stain the gel using the mass spectrometry-compatible silver staining protocol<br />

(see Subheading 3.7).<br />

3.7. Silver Staining and Image Analysis<br />

1. The silver staining protocol as described below is used in the authors’ laboratory<br />

and is highly compatible with protein identification by MALDI-TOF MS and<br />

MALDI-TOF/TOF MS/MS. It should be noted that adequate washing with Milli-<br />

Q water is essential to reduce the risk of keratin contamination. All the solutions<br />

must be prepared with Milli-Q water, and all the chemical reagents should be<br />

filtered to remove any particles that may cause interference during MS analysis.<br />

All solutions prepared from solid chemicals should be freshly prepared before<br />

performing silver staining. Fix the gel with fixing solution for at least 2 h,<br />

changing the solution afresh at hourly intervals.<br />

2. Briefly wash with Milli-Q water, with constant shaking for about 15 min.<br />

3. Remove the wash and cover the gel with appropriate sensitivity-enhancing<br />

solution and incubate for 1 h, with constant shaking.<br />

4. Wash the gel thoroughly with Milli-Q water for 6×15min, with gentle shaking<br />

and replacing with fresh Milli-Q water after each cycle (see Note 12).<br />

5. Stain the gel with silver staining solution for 30 min.<br />

6. Wash off excess stain from the gel with Milli-Q water (twice, for 2×1min).<br />

7. Develop the gel for 5–30 min in a developing solution (see Note 13).<br />

8. Add Stop Solution and shake the gel for approximately 20 min to stop the<br />

reaction.<br />

9. Wash the gel using Milli-Q water for 20 min; replace water and repeat the wash.<br />

10. Scan the gel using Personal Densitometer SI, or store the gel in the gel soak<br />

solution for analysis at a later time.<br />

11. Capture the image using ImageMaster 2D Elite software (GE Healthcare). The<br />

image analysis includes spot detection, quantification and normalization of spot<br />

intensity to the background interferences, according to the instructions from the<br />

software. An example of images showing the differences between the protein<br />

profiles of LCM-microdissected HER-2/neu positive and -negative tumor cells<br />

is shown in Fig. 2.<br />

12. Analyze the image using the software and identify spots that show significant<br />

differences in spot intensities (see Note 14), reflecting differential protein<br />

expression in the two subtypes of breast cancer triggered by the presence or<br />

suppression of HER-2/neu oncogene. Only those spots that show either more<br />

than threefold or less than threefold change in signal intensity, consistently<br />

from three replicate sets of gels, are considered as demonstrating differential<br />

protein expression and selected for further analysis by MALDI-TOF MS/MS.<br />

The likelihood of any protein displaying less convincing evidence of differential<br />

protein expression being a potential biomarker for early detection of tumor<br />

growth or a therapeutic target for breast cancer treatment is low.


86 Zhang and Koay<br />

kDa pI3<br />

HER-2/neu-P<br />

HER-2/neu-N<br />

10 pI3<br />

10<br />

92<br />

50<br />

AAH025396<br />

P04075<br />

NP004095<br />

35<br />

28<br />

P06753-2<br />

AAB49495<br />

P07339<br />

NP001531<br />

NP000627<br />

Fig. 2. Silver-stained protein profiles of LCM-dissected cells. Protein samples from<br />

HER-2/neu positive and -negative cells are separated by using IPG ® ( strips (18 cm,<br />

pH 3–10 NL) and homogeneous SDS-PAGE (10%), and then stained with silver<br />

nitrate. Silver-stained gels were scanned using the Personal Densitometer SI (Molecular<br />

Dynamics) and differentially expressed protein spots were analyzed by ImageMaster<br />

2-D Elite software (GE Healthcare). The Accession Numbers indicate the protein<br />

ID identified by MALDI-TOF/TOF tandem mass spectrometry and NCBInr database<br />

search using Mascot software (Matrix Science, London, UK).<br />

3.8. Trypsin Digestion and Preparation of Peptides for Mass<br />

Spectrometric Analysis<br />

1. Excise the silver-stained protein spots showing significant differential protein<br />

expression, as mentioned above, one at a time, taking care not to include adjacent<br />

proteins in vicinity, and transfer to individual tubes.<br />

2. Wash with 100 μl of Milli-Q water for 5 min.<br />

3. Add 50 μl of the destaining solution into the tubes, and about 20 min on a<br />

platform shaker at RT until the gels become clear in color.<br />

4. Remove the solution carefully and wash with 100 μl of Milli-Q water.<br />

5. Incubate the gel pieces with 25 mM sodium bicarbonate for 20 min, and then<br />

cut them into smaller pieces with the tip of the transfer pipette. Avoid carryover<br />

and contamination during repetitive work on consecutive samples.<br />

6. Rinse the gel pieces with Milli-Q water, discard the wash after pulsing down<br />

the gel pieces, and repeat the washing process three times.<br />

7. Add 100 μl of dehydrating solution and incubate for 20 min at RT.<br />

8. Dry the gel pieces in a SpeedVac centrifuge.<br />

9. Re-swell the dried gel pieces with 10–20 μl of Digestion Solution and leave<br />

overnight at 37°C to ensure complete digestion.<br />

10. Extract the resultant hydrophilic peptides first with 10 μl of Milli-Q water for 1 h.


Combining LCM with 2-D Gel Electrophoresis 87<br />

11. Then extract the hydrophobic peptides with Extraction Solution for 2 h.<br />

12. Pool the extracted hydrophilic and hydrophobic peptides and dry the peptide<br />

mixture using the SpeedVac centrifuge.<br />

13. Redissolve the dried peptides in 10 μl of 0.1% (v/v) TFA.<br />

14. Desalt the sample with ZipTip C18 columns (Millipore) and elute the treated<br />

and purified peptides with 2.5 μl of Eluant.<br />

15. Mix 0.5 μl of the sample eluate with 0.5 μl of CHCA matrix (3 mg/ml) and spot<br />

the mixture onto the stainless steel MALDI-TOF sample target plates.<br />

16. The pretreated peptide samples must be stored on ice during transfer to the<br />

core facility for mass spectrometric analysis. In our laboratory, peptide mass<br />

spectra are obtained by the Applied Biosystems 4700 Proteomics Analyzer<br />

MALDI-TOF/TOF mass spectrometer, set in the positive ion reflector mode.<br />

The subsequent MS/MS analyses are performed in a data-dependent manner,<br />

and the 10 most abundant ions fulfilling certain preset criteria are subjected to<br />

high-energy CID analysis. The collision energy is set to 1 keV, and nitrogen is<br />

used as the collision gas.<br />

3.9. Database Search to Match Protein Identities<br />

Database searches were conducted using the MASCOT search engine<br />

(http://www.matrixscience.com). For database search, known contamination<br />

peaks, such as keratin and autoproteolysis peaks, were removed prior to<br />

database search. Protein identification was performed using the MASCOT<br />

software (Matrix Science, London, UK), and all tandem mass spectra were<br />

searched against the NCBInr database, with mass accuracy of within 200 ppm<br />

for mass measurement, and within 0.5 Da for MS/MS tolerance window.<br />

Searches were performed without constraining the protein molecular weight<br />

(Mr) or isoelectric point (pI) and species, and allowing for carbamidomethylation<br />

of cysteine and partial oxidation of methionine residues. Up to one missed<br />

tryptic cleavage was considered for all tryptic-mass searches. Protein scores<br />

greater than 75 are considered to be significant (p < 0.05).<br />

3.10. Experimental Example: Differential Protein Profiles<br />

between HER-2/neu Positive and -Negative Breast Tumors<br />

We dissected the tumor cells from two different subtypes of breast tumors<br />

and compared their protein profiles, based on the protocols described above.<br />

Figure 2 shows the LCM-dissected tumor cell protein patterns visualized by<br />

silver staining. It should be noted that pooled protein samples from different<br />

cases of the same tumor subtypes were used for 2-D GE. This gel-based<br />

protein visualization technique requires high amount of proteins, and thus<br />

more sensitive detecting reagents and protein identification strategies had to<br />

be developed to produce meaningful results (see Notes 15 and 16). Using


88 Zhang and Koay<br />

the silver-staining protocol, we identified 500–600 protein spots in the protein<br />

profiles generated by coupling LCM and 2-D GE. Protein spots of interest would<br />

be excavated and digested with trypsin (Promega), desalted with ZipTipc 18<br />

(Millipore), and analyzed using MALDI-TOF/TOF tandem mass spectrometry.<br />

Protein identities, as shown in Fig. 2, are obtained by searching the NCBInr<br />

databases using the MASCOT software (Matrix Science).<br />

4. Notes<br />

1. All the chemical solutions should be filtered by passing them through filter paper<br />

(Cat No. 1001 150, Whatman ® , Whatman International Limited, Springfield<br />

Mill, Maidstone, Kent, England) to minimize precipitates occurring onto the<br />

gels during silver staining.<br />

2. Tissue processors in standard histopathology laboratories generally include<br />

formalin fixation as the first step in the paraffin infiltration procedure. It is<br />

important to avoid these steps when processing tissues intended for molecular<br />

gene and proteome profiling.<br />

3. Consistent LCM transfers have been demonstrated from 5–10 μm thick paraffinembedded<br />

tissue sections. For a successful LCM transfer, the strength of the bond<br />

between polymer film and targeted tissue must be stronger than that between the<br />

tissue and the underlying glass slide. Therefore, for most tissue types, sections<br />

should be collected with uncharged glass slides. To prevent cross-contamination<br />

while sectioning, residual paraffin and tissue fragments should be wiped off<br />

from the area of the sectioning blade with xylenes between consecutive slides.<br />

If possible, a fresh microtome blade should be used to section a different block.<br />

4. In our hands, hematoxylin and eosin are best reduced to 10% of their standard<br />

concentrations used for routine histomorphological work, when applied to slides<br />

prepared for LCM. Breast tumor cells can be clearly visualized and identified<br />

from other cell types, without influencing the procurement of tumor cells by<br />

LCM, with this modification. Minimum staining also improves macromolecular<br />

recovery during cellular protein extraction.<br />

5. Complete dehydration and air drying of sections are the main factors influencing<br />

the efficiency of LCM. Prolonged air drying or presence of moisture in the<br />

sections appears to inhibit, at least partially, the transfer of cells to the plastic<br />

firm.<br />

6. If the investigators have less experience in checking cancer tissue sections,<br />

we strongly recommend that investigators consult with the pathologists in their<br />

institutions to get assistance in identifying the target cell types that will be<br />

microdissected using LCM. It is essential to avoid contamination of other cell<br />

types, or dissecting the wrong cells.<br />

7. During microdissection, make sure that there are no irregularities on the tissue<br />

surface in or near the area to be microdissected. It should also be noted that<br />

wrinkles can elevate the LCM cap away from the tissue surface and decrease the


Combining LCM with 2-D Gel Electrophoresis 89<br />

membrane contact during laser activation. Use an adhesive pad after microdissection<br />

to remove cells that may have attached non-specifically to the LCM<br />

cap. A cap-alone control is recommended for each experiment to ensure that<br />

non-specific transfer is not occurring during microdissection. The cap should be<br />

processed together with other tissue-containing caps and serves as a negative<br />

control. For protein separation by 2-D GE, 20 to 30 sections from each tissue<br />

sample are dissected, depending on the percentage of targets cells in the full<br />

sections. Generally, 2300–2700 laser pulse shots are used for each cup. Cells<br />

from at least 50,000 shots (spot diameter is 15 μm) are required for each<br />

18-cm gel.<br />

8. Up to 15 mg of proteins can be solubilized with 500 μl of the sample rehydration<br />

buffer, but with our breast tumor tissue samples, we usually reconstitute 1–2 mg<br />

of extracted proteins in 500 μl, or 2–4 mg/ml. It is recommended that the<br />

reconstituted proteins be stored in appropriate aliquots, and that only the required<br />

number of aliquots needed for the experiment at hand be removed at any time,<br />

to avoid repeated freezing and thawing the peptides, which will lead to sample<br />

deterioration.<br />

9. IEF is performed using Ettan IPGphor IEF electrophoresis unit. Rehydration<br />

loading of protein samples is used in the authors’ laboratory. The IPG strips for<br />

first-dimensional separation are commercially available, and can be procured<br />

from GE Healthcare and other suppliers. IPG strips with various pH gradients and<br />

dimensions are available. They are used for protein separation with appropriate<br />

resolution needed. The strips should be kept frozen at –20°C, and thawed just<br />

before use. The IEF conditions are dependent on the pH range. Reference to the<br />

manufacturer’s protocol is recommended. For alkali pH loading, cup loading<br />

is a must, and DTT in the rehydration buffer should be replaced by other<br />

reducing agents, such as hydroxyethyl-disulfide (HED) reagent (Destreak, GE<br />

Healthcare).<br />

10. It is essential to equilibrate the strips before being applied for the seconddimension<br />

gel electrophoresis (2-D SDS-PAGE). DTT added to buffer A will<br />

reduce the disulfide bonds whereas IAA in buffer B will alkylate the formed<br />

sulfydryl groups of proteins. This is to prevent re-oxidation of sulfydryl groups<br />

and streaking of spots during 2-D SDS-PAGE. Further, the presence of SDS<br />

makes the proteins negatively charged and suitably primed for SDS-PAGE. Use<br />

the best quality SDS available for sample and running buffers that include SDS<br />

in their formulation. We recommend C 12 Grade SDS from Pierce (Rockford, IL,<br />

USA).<br />

11. When placing the strips on top of the gel, ensure that the plastic backing of the<br />

strips is in contact with the glass wall. If necessary, the strips can be trimmed<br />

properly. When adding agarose sealing solution, make sure that there are no air<br />

bubbles trapped between the IEF strip and 2-D gel.<br />

12. Wash the gels thoroughly and repeatedly, as recommended, prior to the development<br />

step and during the development step itself, to get clear stained gels.<br />

During the development of the gels, formaldehyde should be added prior to use,


90 Zhang and Koay<br />

and the suggested concentration should be followed strictly to avoid interference<br />

during MALDI-TOF analysis. During the developing stage, the gel should be<br />

constantly shaken to reduce the background.<br />

13. The developing time depends on the total amount of protein that is used for<br />

2-D separation. With a higher amount of protein, a shorter developing time can<br />

be used, without compromising the aim of visualizing the maximum number of<br />

protein spots.<br />

14. It is important to manually verify spot detection and matching, as the variations<br />

in gel resolution, staining, gel background, and automatic image analysis may<br />

not correctly define the spot contours in every case. This variability and the<br />

complexity of 2-D gel patterns hinder the accurate matching of analogous spots<br />

in different gels.<br />

15. In our experience, approximately 500 to 600 distinct proteins from the dissected<br />

breast tumor cells can be visualized on 2D-PAGE stained with silver. On average,<br />

we can extract approximately 4–6 μg of total cellular proteins from 2500 laser<br />

pulses. Our experience is that silver staining of LCM-dissected cell proteins is a<br />

sufficiently sensitive tool for isolating and identifying the dysregulated cellular<br />

proteins of high or moderate abundance. However, for the dysregulated proteins<br />

of low abundance, the lower detection limit of this technology would have to<br />

be enhanced by other techniques such as 125-iodine labeling or biotinylation<br />

and fluorescent dye labeling. In addition, the use of scanning immunoblotting<br />

with class-specific antibodies, for example, would allow sensitive detection of<br />

specific subsets of proteins, e.g., all known proteins involved with cell-cycle<br />

regulation.<br />

16. Protein identification by MALDI-TOF, LC-MS/MS, or other techniques is also<br />

limited by the requirement of a minimal protein input amount, which is often not<br />

attainable from certain types of biopsy samples. A useful strategy to improve<br />

protein identification is to produce parallel “diagnostic” fingerprints derived<br />

from microdissected cells and “sequencing” the fingerprints generated from the<br />

whole tissue section from each case. Alignment of the diagnostic and sequencing<br />

2D gels permits determination of the proteins of interest for subsequent mass<br />

spectrometry or N-terminal sequence analysis.<br />

Acknowledgments<br />

The Tumor Repository of the National University Hospital, Singapore,<br />

provided the clinical breast cancer frozen tissues for LCM. The use of the<br />

PixCell II LCM system was courtesy of the Department of Pathology, Yong<br />

Loo Lin School of Medicine, National University of Singapore (NUS). This<br />

work was supported by an Academic Research Fund from the NUS (Grant No.<br />

R-179-000-032) to the authors.


Combining LCM with 2-D Gel Electrophoresis 91<br />

References<br />

1. Zhang, D., Tai, L. K., Wong, L. L., Sethi, S. K., Koay, E. S. (2005) Proteomics of<br />

breast cancer: enhanced expression of cytokeratin 19 in human epidermal growth<br />

factor receptor type 2 positive breast tumors. Proteomics 5, 1797–1805.<br />

2. Neubauer, H., Clare, S. E., Kurek, R., Fehm, T., Wallwiener, D., Sotlar, K., et al.<br />

(2006) Breast cancer proteomics by laser capture microdissection, sample pooling,<br />

54-cm IPG IEF, and differential iodine radioisotope detection. Electrophoresis 27,<br />

1840–1852.<br />

3. Lawrie, L. C., Curran, S., McLeod, H. L., Fothergill, J. E., Murray, G. I. (2001)<br />

Application of laser capture microdissection and proteomics in colon cancer. J.<br />

Clin. Pathol: Mol. Pathol. 54, 253–258.<br />

4. Ai, J., Tan, Y., Ying, W., Hong, Y., Liu, S., Wu, M., et al. (2006) Proteome<br />

analysis of hepatocellular carcinoma by laser capture microdissection. Proteomics<br />

6, 538–546.<br />

5. Ahram, M., Flaig, M. J., Gillespie, J. W., Duray, P. H., Linehan, W. M.,<br />

Ornstein, D. K., et al. (2003) Evaluation of ethanol-fixed, paraffin-embedded tissues<br />

for proteomic applications. Proteomics 3, 413–421.<br />

6. Greengauz-Roberts, O., Stoppler, H., Nomura, S., Yamaguchi, H., Goldenring,<br />

J. R., Podolskym R. H., et al. (2005) Saturation labeling with cysteine-reactive<br />

cyanine fluorescent dyes provides increased sensitivity for protein expression<br />

profiling of laser-microdissected clinical specimens. Proteomics 5, 1746–1757.<br />

7. Zang, L., Palmer-Toy, D., Hancock, W. S., Sgroi, D. C., Karger, B. L. (2004)<br />

Proteomic analysis of ductal carcinoma of the breast using laser capture microdissection,<br />

LC-MS, and 16 O/ 18 O isotopic labeling. J. Proteome Res. 3, 604–612.<br />

8. Li, C., Hong, Y., Tan, Y. X., Zhou, H., Ai, J. H., Li, S. J., et al. (2004) Accurate<br />

qualitative and quantitative proteomic analysis of clinical hepatocellular carcinoma<br />

using laser capture microdissection coupled with isotope-coded affinity tag and<br />

two-dimensional liquid chromatography mass spectrometry. Mol. Cell. Proteomics<br />

3, 399–409.<br />

9. Zhang, D., Tai, L. K., Wong, L. L., Chiu, L. L., Sethi, S. K., and Koay, E. S. (2005)<br />

Proteomic study reveals that proteins involved in metabolic and detoxification<br />

pathways are highly expressed in HER-2/neu-positive breast cancer. Mol. Cell.<br />

Proteomics 4, 1686–1696.<br />

10. Cowherd, S. M., Espina, V. A., Petricoin, E. F. III, Liotta, L. A. (2004) Proteomic<br />

analysis of human breast cancer tissue with laser-capture microdissection and<br />

reverse-phase protein microarrays. Clin. Breast Cancer 5, 385–392.<br />

11. Gulmann, C., Espina, V., Petricoin, E. III, Longo, D. L., Santi, M., Knutsen, T.,<br />

et al. (2005) Proteomic analysis of apoptotic pathways reveals prognostic factors<br />

in follicular lymphoma. Clin. Cancer Res. 11, 5847–5855.


6<br />

Optimizing the Difference Gel Electrophoresis<br />

(DIGE) Technology<br />

David B. Friedman and Kathryn S. Lilley<br />

Summary<br />

Difference gel electrophoresis (DIGE) technology has been used to provide a powerful<br />

quantitative component to proteomics experiments involving 2D gel electrophoresis. DIGE<br />

combines spectrally resolvable fluorescent dyes (Cy2, Cy3, and Cy5) with sample multiplexing<br />

for low technical variation, and uses an internal standard methodology to analyze<br />

replicate samples from multiple experimental conditions with unsurpassed statistical confidence<br />

for 2D gel-based differential display proteomics. DIGE experiments can facilely<br />

accommodate sufficient independent (biological) replicate samples to control for the large<br />

interpersonal variation expected from clinical samples. The use of multivariate statistical<br />

analyses can then be used to assess the global variation in a complex set of independent<br />

samples, filtering out the noise from technical variation and normal biological variation<br />

thereby focusing on the underlying variation that can describe different disease states. This<br />

chapter focuses on the design and implementation of the DIGE methodology employing<br />

the use of a pooled-sample internal standard in conjunction with the minimal CyDye<br />

chemistry. Notes are also provided for the use of the alternative saturation labeling<br />

chemistry.<br />

Key Words: difference gel electrophoresis; two-dimensional gel electrophoresis;<br />

quantification.<br />

1. Introduction<br />

Human disease phenotypes are a direct result of protein expression and<br />

modification. In many cases, such phenotypes cannot be tied directly to a single<br />

alteration in the genome or resulting proteome, but are likely to be the result<br />

From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />

Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />

93


94 Friedman and Lilley<br />

of multiple factors. Studying disease at the protein level is challenging, but<br />

as proteins are the mediators of phenotype, the study of protein abundance<br />

on a global scale is required to gain a more complete understanding of the<br />

underlying molecular mechanisms of disease. Proteomics in the clinical setting<br />

is rapidly developing and is having a major impact on the way in which diseases<br />

will be diagnosed, treated, and monitored (1). It has been estimated that there<br />

could be hundreds of thousands of different protein isoforms in a mammalian<br />

cell, but the vast dynamic range of protein abundance results in only the<br />

most abundant species of proteins being observable by quantitative proteomics<br />

approaches unless technically variable biochemical or subcellular fractionation<br />

is employed. The repertoire of techniques and associated hardware, which is<br />

now applied to this field, is expanding exponentially, and although a complete<br />

visualization of the proteome is still beyond reach of any single technique, each<br />

technology platform can provide complementary datasets.<br />

Difference gel electrophoresis (DIGE) has proven to be a powerful quantitative<br />

technology for differential display proteomics on a global level, where<br />

the individual abundance changes for thousands of intact proteins can be simultaneously<br />

monitored in replicate samples over multiple variables with statistical<br />

confidence (see Note 1). This includes quantitative information on protein<br />

isoforms that arise due to post-translational modifications (such as acetylation<br />

or phosphorylation), which result in a change in the isoelectric point of the<br />

protein. This also includes splice variants and the results of protein processing,<br />

all of which are resolved for individual quantification and subsequent analysis<br />

by MS.<br />

DIGE is based on conventional 2D gel technology that is capable of resolving<br />

several thousands of intact proteins first by charge using isoelectric focusing<br />

(IEF) and then by apparent molecular mass using SDS-polyacrylamide gel<br />

electrophoresis (PAGE) (6,7) (see Note 2 and Chapters 4 and 5 by Cho et al.<br />

and Zhang et al., respectively). Importantly, DIGE overcomes many of the<br />

limitations commonly associated with 2D gels such as analytical (gel-to-gel)<br />

variation and limited dynamic range that can severely hamper a quantitative<br />

differential display study. This is accomplished using up to three spectrally<br />

resolvable fluorescent dyes (Cy2, Cy3, and Cy5, referred to as CyDyes) that<br />

enable low- to subnanogram sensitivities with >10 4 linear dynamic range, and<br />

then by multiplexing the prelabeled samples into the same analytical run (2D<br />

gel). Multiplexing in this way allows for direct quantitative measurements<br />

between the samples coresolved in the same gel, and is therefore beyond<br />

the limitations imposed by between-gel comparisons with conventional 2D<br />

gels.<br />

The highest statistical power of this multiplexing approach stems from the<br />

utilization of a pooled-sample internal standard comprised of an equal aliquot


Optimizing DIGE Technology 95<br />

of every sample in the experiment (see Subheading 1.2.1). With this method,<br />

two dyes (Cy3 and Cy5) are used to individually label two independent samples<br />

from a much larger experiment, and the Cy2 dye is used to label an internal<br />

standard, which is comprised of an equal aliquot of proteins from every sample<br />

in the experiment. This pooled-sample internal standard is labeled only once in<br />

bulk to avoid additional technical variation, and enough is made and labeled to<br />

allow for an equal aliquot to be coresolved on each gel. The three differentially<br />

labeled samples are then coresolved on the same 2D gel, after which direct<br />

measurements can be made for each resolved protein using the spectrally<br />

exclusive dye channels without interference from technical variation of the<br />

separation (gel-to-gel variation).<br />

Rather than making direct quantitative measurements between the two<br />

samples in the gel, the measurements are instead made relative to the Cy2 signal<br />

for each resolved protein. The Cy2 signal should be the same for a given protein<br />

across different gels because it came from the same bulk mixture/labeling;<br />

therefore, any difference represents gel-to-gel variation, which can be effectively<br />

neutralized by normalizing all Cy2 values for a given protein across all<br />

gels. Using the Cy2 signal to normalize ratios between gels then allows for the<br />

Cy3:Cy2 and Cy5:Cy2 ratios for each protein within each gel to be normalized<br />

to the cognate ratios from the other gels, encompassing all samples. Each gel<br />

may contain different (and/or replicate) samples in the Cy3 and Cy5 channels,<br />

but all samples can be quantified relative to each other because each protein<br />

from each sample is measured to the cognate Cy2 signal from the internal<br />

standard present on each gel. With the use of sufficient replicates, a plethora of<br />

advanced statistical tests can be applied, which can highlight proteins of interest<br />

whose change in expression is related to the disease state under investigation.<br />

Since the technical noise is low, these vital replicates should be independent<br />

(biological) replicates as most of the observed variations will be clinical sample<br />

related rather than technical or experimental related.<br />

In a final step, specific proteins of interest are then identified using standard<br />

mass spectrometry (MS) approaches on gel-resolved proteins that have been<br />

excised and proteolyzed into a discrete set of peptides. Briefly, excised proteins<br />

are subjected to in-gel digestion with trypsin protease (typically), and MS<br />

is used to acquire accurate mass determinations on the resulting peptides,<br />

as well as fragmentation on individual peptides. The mass spectral data are<br />

then used to identify statistically significant candidate protein matches through<br />

sophisticated computer search algorithms that compare the observed MS data<br />

with theoretical peptide masses (using data generated by peptide mass fingerprinting)<br />

or collision-induced fragmentation patterns (obtained from tandem<br />

MS) generated in silico from protein sequences present in databases. (see<br />

Chapter 19 by Fitzgibbon et al.).


96 Friedman and Lilley<br />

1.1. Optimizing Sensitivity and Resolution<br />

There are currently two forms of CyDye labeling chemistries available:<br />

minimal labeling involving the use of N-hydroxy succinimidyl (NHS) ester<br />

reagents for low-stoichiometry labeling of proteins largely via lysine residues,<br />

and saturation labeling, which utilize maleimide reagents for the stoichiometric<br />

labeling of cysteine sulfhydryls.<br />

The most established DIGE chemistry is the “minimal labeling” method,<br />

which has been commercially available since July 2002. Here the CyDye DIGE<br />

fluors are supplied as NHS esters, which react with the -amine groups of<br />

lysine side chains. The three fluors are mass matched (ca. 500 Da), and carry<br />

an intrinsic +1 charge to compensate for the loss of each proton-accepting site<br />

that becomes labeled (thereby maintaining the pI of the labeled protein). Each<br />

dye molecule also adds a hydrophobic component to proteins, which along with<br />

MW influences how proteins migrate in SDS-PAGE.<br />

Minimal labeling reactions are optimized such that only 2–5% of the total<br />

number of lysine residues are labeled, such that on average a given labeled<br />

protein would contain only one dye molecule. This is necessary because lysine is<br />

an abundant amino acid, and multiple labeling events may affect the hydrophobicity<br />

of some proteins such that they may no longer remain soluble under<br />

2DE conditions. Although a given protein form may exhibit specific labeling<br />

efficiencies, these will be the same for labeling with all three dyes, allowing<br />

for direct relative quantification. Minimal labeling with CyDye DIGE fluors is<br />

very sensitive, comparable to silver-staining or postelectrophoretic fluorescent<br />

stains such as Sypro Ruby, Deep Purple or Flamingo Pink (ca. 1 ng), but with<br />

a linear response in protein concentration over five orders of magnitude (8)(see<br />

Note 3).<br />

For maleimide labeling of the cysteine sulfhydryls, the overall lower cysteine<br />

content in proteins allows for labeling of these residues to saturation without<br />

increasing the overall hydrophobicity of the proteins to cause insolubility<br />

problems. Saturation labeling is ultimately more sensitive (150–500 picograms,<br />

and even more so for proteins with high cysteine content). Its use is not as<br />

commonplace, most likely due to the availability of only Cy3 and Cy5 with<br />

this chemistry (see Note 4), the fact that it is blind to the small but significant<br />

population of noncysteine containing proteins, and the additional optimization<br />

of complete cysteine reduction necessary for reproducible labeling. For these<br />

reasons, saturation DIGE is usually reserved for experiments where samples<br />

are limited, where the advantage of the increased sensitivity outweigh these<br />

additional considerations.<br />

To maximize the information that can be gained from DIGE experiments, it is<br />

imperative that resolution of protein species within gels is optimized. Although<br />

single 2DE runs can resolve proteins with pI ranges between pH 3 and 11, and


Optimizing DIGE Technology 97<br />

apparent molecular mass ranges between 10 and 200 kDa, higher resolution and<br />

sensitivity can be obtained by running a series of medium range (e.g., pH 4–7,<br />

7–11) and narrow range (e.g., pH 5–6) IEF gradients with increasing protein<br />

loads, leading to an overall more comprehensive proteomic analysis (6,7,10).<br />

(see Note 5). This is analogous to gaining increased resolution and sensitivity in<br />

an LC/MS-based strategy by using multiple high performance liquid chromatography<br />

columns with different affinity chemistries [e.g., MuDPIT (12)]. Much of<br />

the sensitivity limitation associated with 2D gels can be attributed to the analysis<br />

of unfractionated, whole-cell and whole-tissue extracts. Additional sensitivity<br />

can be gained via enrichment for the proteins of interest, such as by analyzing<br />

prefractionated or subcellular samples, or immune complexes. However, the<br />

additional experimental manipulations required for prefractionation introduce<br />

more technical variation into the samples and necessitates increased independent<br />

(biological) replicates (which can be accommodated with the DIGE internal<br />

standard methodology).<br />

The identification of proteins of interest using MS can be performed directly<br />

from the DIGE gels when protein amounts have been optimized in this way (see<br />

Subheading 3.5). Alternatively, some experimental approaches perform DIGE<br />

analysis using “analytical” gels with lower protein amounts, followed by protein<br />

excision from a secondary, “preparative” gel with higher protein amounts. This<br />

approach has its advantages when dealing with small sample amounts, such is<br />

often the case using the saturation dye chemistries, but is also prone to uncertainties<br />

that arise due to the disproportionate amount of protein loading (see<br />

Note 6). The methods presented in this protocol are for optimization of both<br />

the DIGE data as well as material for subsequent MS using high protein loads.<br />

1.2. Optimizing Statistical Significance<br />

1.2.1. Using the Internal Standard<br />

The ability to coresolve and compare two or three samples in a single gel is<br />

attractive, because it allows for direct relative quantification for a given protein<br />

without any interference from gel-to-gel variations in migration and resolution,<br />

removing the need for running replicate gels for each sample (similar to stable<br />

isotope LC/MS-based strategies, see Chapter 10). This approach has limited<br />

statistical power, however, since confidence intervals are determined based on<br />

the overall variation within a population (see Subheading 3.6.2).<br />

Many researchers new to DIGE technology are not immediately aware of<br />

the increased statistical advantage and multiplexing capabilities of DIGE when<br />

combining this approach with a pooled-sample mixture as an internal standard<br />

for a series of coordinated DIGE gels (13). This design will allow for repetitive<br />

measurements (vital to any type of experimental investigation), and in


98 Friedman and Lilley<br />

such a way as to control both for gel-to-gel variation and provide increased<br />

statistical confidence. In this way, statistical confidence can be measured for<br />

each individual protein based on the variance of repetitive measurements,<br />

independent of the variation in the population. Incorporating independently<br />

prepared replicate samples into the experimental design also controls for<br />

unexpected variation introduced into the samples during sample preparation.<br />

This more complex and statistically powerful experimental design is accomplished<br />

by using one of the three dyes (usually Cy2) to label an internal standard,<br />

which is comprised of equal aliquots of protein from all of the samples in an<br />

experiment. The total amount of the Cy2-labeled internal standard is such that<br />

an equal aliquot can be coresolved within each DIGE gel that also contains<br />

an individual Cy3- and Cy5-labeled sample from the experiment. Since this<br />

standard is composed of all of the samples in a coordinated experiment, each<br />

protein in a given sample should be represented in the standard and thus have<br />

its own unique internal standard (see Note 7). Direct quantitative comparisons<br />

are made individually for each resolved protein between the Cy3- or Cy5-<br />

labeled samples and the cognate protein signal from the Cy2-labeled standard<br />

for that gel (without interference from gel-to-gel variation) and results in the<br />

calculation of a standardized abundance for every spot matched across all<br />

gels within a multigel experiment. The individual signals from the internal<br />

standard are also used to normalize and compare between each in-gel direct<br />

quantitative comparison for that particular protein from the other gels. Using<br />

the Cy2-labeled standard in this fashion, therefore, allows for more precise<br />

and complex quantitative comparisons between gels, including independent<br />

(biological) sample repetition (Fig. 1).<br />

Importantly, the internal standard experimental design allows for the identification<br />

of significant changes that would not have been identified if the analyses<br />

were performed separately, even when using Cy3- and Cy5-labeled samples on<br />

the same DIGE gel (14). This experimental design also allows for multivariable<br />

analyses to be performed in one coordinated experiment, whereby statistically<br />

significant abundance changes can be quantitatively measured simultaneously<br />

between several sample types (e.g., different genotypes, drug treatments, or<br />

disease states), with repetition and without the necessity for every pairwise<br />

comparison to be made within a single DIGE gel (15,16) (see Note 8 and<br />

Chapter 17 by Carpentier et al.).<br />

1.2.2. Assessing Intersample Variation<br />

Clinical proteomics is hampered by the significant variation associated with<br />

patient samples. The largest proportion of this variation comes from biological<br />

diversity, but a significant amount may also come from variable collection


Optimizing DIGE Technology 99<br />

Fig. 1. Illustration of DIGE and experimental design using the mixed-sample internal<br />

standard. (A) Representative gel from a six-gel set containing three differentially<br />

labeled samples: Cy2-labeled internal standard, Cy3-labeled sample #1, and Cy5-<br />

labeled sample #2. The individual protein forms all coresolve in this one gel, but these<br />

three independently labeled populations of proteins can be individually imaged using<br />

mutually exclusive excitation/emission properties of the CyDyes. (B) Schematic of<br />

the sample loading matrix indicating gel number, CyDye labeling and three replicates<br />

(indicated as “1, 2, and 3”) of the four conditions being tested (A, B, C, D). Within<br />

the boxed regions representing each labeled sample is depicted a theoretical protein<br />

that is upregulated in condition D. Dotted lines illustrate how the protein signals from<br />

each sample are directly quantified relative to the Cy2 internal standard signal for<br />

that protein without interference from gel-to-gel variation, and how the Cy3:Cy2 and<br />

Cy5:Cy2 intragel ratios are normalized between the six gels. (C) A graphical representation<br />

of the normalized abundance ratios for this theoretical protein change. Adapted<br />

from (10).<br />

and storage of biological samples. It is of vital importance to identify changes<br />

in protein abundance that are disease specific rather than patient or sample<br />

specific.<br />

In order to gain the more robust data sets necessary to be able to draw<br />

accurate conclusions from clinical proteomics studies, it is, therefore, necessary<br />

to collect and store samples using very stringent and closely adhered to


100 Friedman and Lilley<br />

protocols. It is also necessary to assess the biological variation within the<br />

population being tested and also within a single individual. Interindividual<br />

variation has been the focus of several studies (17,18) and determining a<br />

typical diversity within a single patient (i.e., taking longitudinal samples and<br />

assessing variability in protein abundance) and between patients will determine<br />

the minimum number of patient samples required for an experiment. This is<br />

an essential step before embarking on any large-scale and potentially costly<br />

DIGE experiment. Without this type of pretest, the results of underpowered<br />

experiments run the risk of being peppered with false information (both false<br />

positives and negatives).<br />

As with all complex technologies, the DIGE technique itself is subjected<br />

to technical variation, which will be laboratory specific to a greater or lesser<br />

extent. However, the amplitude of this variation is generally outweighed by the<br />

biological variation associated with a typical sample set (19).<br />

1.2.3. Univariate Statistical Analyses<br />

To date, the majority of published quantitative proteomics studies using the<br />

DIGE technology have applied a univariate test, such as a Student’s t-test<br />

or analysis of variance (ANOVA), to identify protein species with significant<br />

changes in expression [(20) and Chapter 17 by Carpentier et al.]. These tests<br />

calculate the probability (p) that the samples being compared are the same and<br />

therefore any apparent change in expression occurs by chance alone. Typically<br />

an expression change is considered significant if the calculated p-value falls<br />

below a prescribed significance threshold, typically 0.05 (whereby 1 in 20 tests<br />

may give a change in expression by chance). For more stringent analyses, a<br />

p-value of 0.01 is often used as the significance threshold.<br />

When employing these tests on DIGE datasets, there are several factors<br />

that must be considered if correct assumptions are to be made from ensuing<br />

analyses. Student’s t-tests and ANOVA assume that the data achieved is<br />

normally distributed and that any variance is homogeneous. The measurement<br />

and correction of systematic bias within DIGE experiments have been the<br />

subject of several studies, which chart methods to optimize normalization of<br />

data sets (21,22,23).<br />

Another important consideration is that of false discovery rate (FDR), which<br />

could arise as a result of statistical tests such as the ones described above.<br />

These tests involve the simultaneous and independent testing of thousands of<br />

spots. The probability of a false positive being recorded for each test is such<br />

that a substantial number of false positives may accumulate. There are several<br />

approaches to determine the FDR and adjust p-scores to compensate for this,


Optimizing DIGE Technology 101<br />

the most widely used to date being the Benjamini and Hochberg method, whose<br />

use in conjunction with DIGE data has been described by Fodor et al. (21).<br />

1.2.4. Multivariate Statistical Analyses<br />

Discovery phase proteomics often produce large lists of proteins that are<br />

identified as changing significantly in the experiment, many of which may well<br />

be false positives. Another approach to overcome these is the application of<br />

additional multivariate statistical analyses to these datasets, which can help to<br />

filter out false positives that result from whole sample outliers (i.e., sample<br />

misclassification and/or poor sample preparation technique). These analyses,<br />

such as principle components analysis (PCA), partial least squares discriminate<br />

analysis, and unsupervised hierarchical clustering (HC) (see Figs. 2 and 3 and<br />

Chapter 16 by Marengo et al.) have recently been applied to DIGE datasets<br />

[(10,24,25,26,27,28,29,30,31,32)]. Raw and normalized data can be exported<br />

from most DIGE software solutions (e.g., DeCyder, Progenesis), and several<br />

multivariate analyses are now part of an extended data analysis (EDA) software<br />

module as part of the DeCyder suite of software tools (GE Healthcare), which<br />

was specifically developed for DIGE analysis (see Subheading 3.6).<br />

These multivariate analyses work essentially by comparing the expression<br />

patterns of all (or a subset of) proteins across all samples, using the variation<br />

of expression patterns to group or cluster individual samples. Technical noise<br />

(poor sample prep, run-to-run variation) and biological noise (normal differences<br />

between samples, especially present in clinical samples) are almost always<br />

Δfur<br />

Heme<br />

PC2<br />

–Fe<br />

control<br />

PC1<br />

Fig. 2. Illustration of the use of principle component analysis. DIGE was used<br />

to analyze changes in Staphylococcus proteins in response to genetic and chemical<br />

alterations affecting iron utilization. Adapted from (24).


102 Friedman and Lilley<br />

Fig. 3. Hierarchical clustering (by average distance correlation) of representative<br />

novel circadian proteins detected by 2D DIGE of soluble protein extracts from mouse<br />

liver. Pale gray represents low levels of protein expression, black represents intermediate<br />

levels, and dark gray represents high levels of expression. Adapted from (32).<br />

associated with any analytical dataset of this nature, and may well override<br />

any variation that arises due to actual differences related to the biological<br />

questions being tested. Unsupervised clustering of related samples, therefore,<br />

adds additional confidence that a “list of proteins” changing in a DIGE experiment<br />

are not arising stochastically (10).


Optimizing DIGE Technology 103<br />

1.3. DIGE in the Clinical Setting<br />

Although the potential for DIGE to address clinical studies is only beginning<br />

to be addressed [for example, see (29,30)], many studies have been published<br />

demonstrating the feasibility and benefit of DIGE/MS using small patient<br />

cohorts for preliminary studies in colon (14), liver (33,34,35), breast (36,37),<br />

esophageal (38,39), and pancreatic cancers (40), as well as other important<br />

clinical studies such as Severe Acute Respiratory Syndrom (SARS) (41). Many<br />

studies also explore the important benefit of procuring samples using laser<br />

capture microdissection (LCM – see Chapters 3, 5, and 9 by Diaz et al., Zhang<br />

et al., and Mustafa et al., respectively) for a highly enriched population of the<br />

cells under study (16,30,42,43,44). These LCM studies necessitate the use of<br />

the saturation chemistry owing to the increased sensitivity but limited multiplexing<br />

power, and typically require secondary preparative gels with higher<br />

protein loads to enable protein identification by MS.<br />

The study of Suehara et al. (29) represents the utility of a multivariable<br />

DIGE/MS analysis with an extended sample set pertinent for a clinical study.<br />

Eighty soft tissue sarcoma samples comprising seven different histological<br />

backgrounds were analyzed. Using the saturation DIGE fluors, individual<br />

samples were labeled with Cy5 and multiplexed with a pooled-sample internal<br />

standard (labeled in bulk with Cy3) for each DIGE gel. Using high-resolution<br />

2D gel separations and a combination of multivariate statistical tools (support<br />

vector machines, leave-one-out cross-validation, PCA, and HC), these studies<br />

identified a small subset of proteins including tropomyosin and HSP27 that<br />

were able to discriminate between the different classes of tumors. HSP27 in<br />

particular was part of a subclass of discriminating proteins that could distinguish<br />

between leiomyosarcoma and malignant fibrous histiocytoma (MFH), as<br />

well as correlate with patient survival between low-risk and high-risk groups.<br />

HSP27 has long been associated with prognosis in MFH as well as in other<br />

human carcinomas (45).<br />

2. Materials<br />

This chapter assumes a solid understanding in 2D gel electrophoresis and<br />

will focus on the design and implementation of the DIGE method using the<br />

pooled-sample internal standard methodology and the minimal dye chemistry<br />

for Cy2, Cy3, and Cy5, with notes provided for saturation labeling chemistry.<br />

2.1. Cell Lysis Buffers<br />

1. TNE: 50 mM Tris–HCl pH 7.6, 150 mM NaCl, 2 mM EDTA pH 8.0, 2 mM<br />

DTT, 1% (v/v) NP-40.


104 Friedman and Lilley<br />

2. RIPA buffer: 50 mM Tris–HCl pH 8.0, 150 mM NaCl, 1% NP-40, 0.5% deoxycholic<br />

acid, 0.1% SDS.<br />

3. Two-dimentional gel electrophoresis lysis buffer: 7 M urea, 2 M thiourea, 4%<br />

CHAPS, 2 mg/mL DTT, 50 mM Tris–HCl pH 8.0.<br />

4. ASB14 lysis buffer: 7 M urea, 2 M thiourea, 2% amidosulfobetaine 14, 50 mM<br />

Tris–HCl pH 8.0.<br />

NB: depending on the sample, it may also be necessary to add protease<br />

inhibitors and phosphatase inhibitors [sodium pyrophosphate (1 mM), sodium<br />

orthovanadate (1 mM), beta-glycerophosphate (10 mM) and sodium fluoride<br />

(50 mM)] to the chosen lysis buffer (see Subheading 3.1).<br />

2.2. SDS-Polyacrylamide Gel Electrophoresis<br />

1. Immobilized pH gradient (IPG) strips and accompanying ampholyte mixures can<br />

be purchased from a number of commercial vendors. Strip lengths vary from<br />

7 cm to high-resolution 24 cm strips, and pH ranges vary from wide-range (e.g.,<br />

pH 3–11) to high-resolution narrow-range (e.g., pH 5–6) strips.<br />

2. Bind silane working solution (50 mL): 40 mL ethanol, 1 mL acetic acid, 50 μL<br />

bind silane solution (GE Healthcare), 9 mL water (see Note 9).<br />

3. 4× separating gel buffer. 1.5 M Tris-base pH 8.8.<br />

4. 30% acrylamide:bis-acrylamide (37.5:1), N,N,N,N´-tetramethyl-ethylenediamine,<br />

and ammonium persulfate.<br />

5. 10× SDS-PAGE running buffer (1 L): 30.25 g Tris-base, 144.13 g glycine, 10 g<br />

SDS (0.1%).<br />

6. Fixing solution for SyproRuby staining (1 L): 100 mL methanol, 70 mL acetic<br />

acid, 830 mL water. SyproRuby stain is available form several commercial<br />

sources and can be substituted by other total protein stains, such as Deep Purple<br />

(GE Healthcare) or Flamingo Pink (BioRad).<br />

7. Two-dimensional equilibration buffer: 6 M urea, 50 mM Tris-base pH 8.8, 30%<br />

glycerol, 2% SDS, trace bromophenol blue.<br />

8. Water-saturated butanol (see Note 10).<br />

9. Dithiolthreitol (store dessicated).<br />

10. Iodoacetamide (store dessicated, keep in the dark).<br />

2.3. DIGE Labeling Materials<br />

1. N,N-dimethyl formamide (DMF) (see Note 11).<br />

2. Labeling (L) buffer: 7 M urea, 2 M thiourea, 4% CHAPS, 30 mM Tris-base<br />

(do not pH, but ensure that pH of final solution is between 8.0 and 9.0), 5 mM<br />

magnesium acetate (see Note 12). Alternatively, 4% CHAPS can be replaced<br />

with 2% ASB14, especially in cases where membrane rich samples are being<br />

utilized.<br />

3. Rehydration (R) buffer: 7 M urea, 2 M thiourea, 4% CHAPS, 2 mg/mL DTT<br />

(13 mM; 2%).


Optimizing DIGE Technology 105<br />

4. Cyanine dyes with NHS-ester chemistry for minimal labeling (Cy2, Cy3, and<br />

Cy5), and with maleimide chemistry for saturation labeling (Cy3 and Cy5) are<br />

available from GE Healthcare as dry solids.<br />

5. Quenching solution (for minimal labeling): 10 mM lysine.<br />

6. Dithiothreitol reduction stock solution: 200 mg/mL DTT.<br />

3. Methods<br />

The DIGE is a powerful technique for quantitative multivariable differential<br />

display proteomics. However, the quality of the data will only be as good as the<br />

quality of the underlying 2D gel electrophoresis technology upon which it is<br />

based. The main focus of this chapter is to provide detailed notes on the DIGE<br />

technology; however, some key considerations to successful high-resolution 2D<br />

gel electrophoresis are also provided. This section describes methods associated<br />

with labeling using minimal CyDyes.<br />

3.1. Sample Preparation<br />

The key to success for any analytical measurement begins with robust sample<br />

preparation. This not only includes the buffers and materials used, but also the<br />

nature of the samples and the way in which they are procured. The addition<br />

of exogenous materials (such as DNAse, RNAse), or allowing for uncontrolled<br />

manipulation of the sample (such as conditions that may lead to proteolysis) can<br />

severely hamper and sometimes completely prevent an analysis. Care should<br />

be taken to ensure against common laboratory contaminants (e.g., mycoplasma<br />

for tissue culture) that if present may be detected as significant changes using<br />

DIGE, either due to the presence in a subset of samples, or by responding to<br />

the experimental perturbation.<br />

1. Prepare protein extracts using any method of preference.<br />

The appropriate amount of protein can be subsequently precipitated prior to<br />

resuspension in the CyDye labeling buffer (see Subheading 3.2). Ensure against<br />

proteolysis and loss of post-translational modifications (e.g., phosphorylation) as<br />

this is of monumental importance.<br />

Care should be taken not to use reagents that will resolve on the 2D gel, such as<br />

soybean trypsin inhibitor. Small molecule inhibitors such as aprotinin, leupeptin,<br />

pepstatinA, antipain, 4 - (20aminoethyl) benzenesulfonyl fluoride hydrochloride<br />

(AEBSF), sodium orthovanadate, okadaic acid, and microcystin, among others,<br />

are far better choices.<br />

2. Lyse cells using standard lysis buffers such as TNE and RIPA buffers, or even<br />

the buffers used for 2D gel electrophoresis.


106 Friedman and Lilley<br />

All of these buffers have the capability of producing high-resolution samples for<br />

2DE. In most cases, the presence of reagents that would otherwise interfere with<br />

CyDye labeling (such as those that contain primary amines) will be removed prior<br />

to labeling by protein precipitation (see Subheading 3.2).<br />

3. Sonicate cells if necessary to improve sample quality.<br />

Sonication improves sample quality by disrupting nucleic acids, which are subsequently<br />

removed by sample cleanup (see Subheading 3.2) along with phospholipids.<br />

Both of these nonproteinaceous ionic components can obliterate the<br />

resolution during IEF.<br />

Short bursts with a tip-sonicator are suggested. It is important to keep the system<br />

chilled, especially in the presence of urea-containing samples that should never<br />

be heated (see Note 12).<br />

4. Determine the protein concentration of the sample using a system that is<br />

compatible for the buffer that the proteins are extracted in.<br />

CHAPS and thiourea in the buffers used for DIGE, although adequately<br />

chaotropic, interfere with either the Bradford or bicinchoninic acid assays, making<br />

the data inaccurate and unreliable. In these cases, aliquots should be precipitated<br />

prior to quantification in a suitable buffer, or the use of a detergent compatible<br />

assay should be utilized.<br />

5. Aim to use a protein concentration between 1 and 10 mg/mL.<br />

Too dilute and it will be difficult to quantitatively recover proteins following<br />

precipitation cleanup (see Subheading 3.2); too concentrated and it will be<br />

difficult to accurately dispense the appropriate volume for the experiment.<br />

Freeze/thawing should also be kept to a minimum; freezing samples in 1 mL<br />

aliquots or less will usually suffice.<br />

3.2. Sample Cleanup<br />

The desired amount of sample to be used in the experiment should be<br />

precipitated prior to labeling. This removes both nonproteinaceous ions from<br />

the sample (e.g., nucleic acids, phospholipids) that can interfere with IEF,<br />

as well as transfers the proteins into a labeling buffer optimized for CyDye<br />

labeling and subsequent IEF. Determine how much total protein will be on<br />

each gel, and precipitate ½ of that amount for each sample to be run on that<br />

gel. This is straightforward for a two-component separation, but also works out<br />

for the multigel experiments where 1/3 of the total protein amount on each gel<br />

comes from the pooled-sample internal standard (see Table 1.) Precipitate only<br />

what is needed for each sample for the experiment; too much material may<br />

create pellets that are difficult to resolubilize completely.


Table 1<br />

Experimental Design for CyDye Labeling Using a Pooled-Sample Internal Standard<br />

Samples<br />

Gel 1 Gel 2 Gel 3 Pool<br />

Control-1 Treated-1 Control-2 Treated-2 Control-3 Treated-3<br />

Precipitated amount 150 μg 150 μg 150 μg 150 μg 150 μg 150 μg<br />

L-buffer 24 μL 24 μL 24 μL 24 μL 24 μL 24 μL<br />

Aliquot 16 μL 16 μL 16 μL 16 μL 16 μL 16 μL 8 μL (×6)<br />

Cy2 6μL<br />

Cy3 2 μL 2 μL 2 μL<br />

Cy5 2 μL 2 μL 2 μL<br />

30 min on ice in the dark<br />

Lysine (quench) 2 μL 2 μL 2 μL 2 μL 2 μL 2 μL 6 μL<br />

10 min on ice in the dark<br />

Total volume 20 μL 20 μL 20 μL 20 μL 20 μL 20 μL 60 μL<br />

For each gel, combine the quenched Cy3-and Cy5-labeled samples and add 1/3 of the<br />

quenched Cy2-labeled pooled mixture<br />

20+20+20μL 20+20+20μL 20+20+20μL<br />

2× R-buffer 60 μL 60 μL 60 μL<br />

Total 120 μL 120 μL 120 μL<br />

R-buffer to V f to V f to V f<br />

This table illustrates a typical DIGE labeling experiment, as described in Subheadings 3.2 and 3.3.<br />

107


108 Friedman and Lilley<br />

Many precipitation methods are available, the following is a MeOH/CHCl 3<br />

protocol that works well for DIGE, and can be easily performed in 1.5 mL<br />

tubes [adapted from (46)]:<br />

1. Bring up predetermined amount of protein extract to 100 μL with water.<br />

2. Add 300 μL (3-volumes) water.<br />

3. Add 400 μL (4-volumes) methanol.<br />

4. Add 100 μL (1 volume) chloroform.<br />

5. Vortex vigorously and centrifuge; the protein precipitate should appear at the<br />

interface.<br />

6. Remove the water/MeOH mix on top of the interface, being careful not to<br />

disturb the interface. Often the precipitated proteins do not make a visibly white<br />

interface, and care should be taken not to disturb the interface.<br />

7. Add another 400 μL methanol to wash the precipitate.<br />

8. Vortex vigorously and centrifuge; the protein precipitate should now pellet to<br />

the bottom of the tube.<br />

9. Remove the supernatant and briefly dry the pellets in a vacuum centrifuge.<br />

10. Resuspend the pellets in a suitable amount of CyDye labeling buffer (L-buffer,<br />

see Table 1).<br />

An alternative widely used precipitation method is as follows:<br />

1. Add 5 volumes of cold 0.1 M ammonium acetate in methanol.<br />

2. Leave at –20°C for 12 h or overnight.<br />

3. Centrifuge at ∼3000 rpm (1400×g) for 10 min at 4°C and remove the supernatant.<br />

4. A pellet of protein should be visible at this stage.<br />

5. To wash the pellet, add 80% 0.1 M ammonium acetate in methanol and mix to<br />

resuspend the protein.<br />

6. Centrifuge at 3000 rpm (1400×g) for ten min at 4°C and remove the supernatant.<br />

7. To dehydrate the pellet add 80% acetone and resuspend the pellet by mixing.<br />

8. Centrifuge at 3000 rpm (1400×g) for ten min at 4°C and remove the supernatant.<br />

9. Dry pellet for 15 min by leaving open tube in a laminar flow cabinet.<br />

3.3. DIGE Experimental Design<br />

1. Start with a preliminary gel. All experiments should start with a preliminary gel<br />

on representative samples to ensure equivocal protein amounts between samples,<br />

and that the highest resolution and sensitivity are obtained before embarking on a<br />

multigel DIGE experiment. (see Notes 13 and 6). The preliminary gel will also show<br />

any problems with the sample preparation that may be corrected by adjusting the<br />

procurement methods (see Subheading 3.1). This step can also be used to optimize<br />

the maximal amount of protein can be loaded without adversely affecting resolution.<br />

The preliminary gel needs only to test one or two of the samples of a much<br />

larger experiment. This gel can simply be stained with a total protein stain (e.g.,<br />

Sypro Ruby or Deep Purple) to visually inspect the resolution and sensitivity.


Optimizing DIGE Technology 109<br />

Alternatively, the gel can contain two different samples prelabeled with Cy3<br />

and Cy5 and coresolved. (see Note 14).<br />

2. Choose a suitable pH gradient for the IEF. Precast IEF strips are commercially<br />

available from several vendors. The widest length is currently 24 cm, providing<br />

the highest resolving power for a given pH range. Medium-range IEF gradients<br />

(e.g., pH 4–7) offer the best trade-off between overall resolution and sensitivity.<br />

Subsequent experiments can then be designed to resolve proteins in the basic<br />

range (pH 7–11) and in narrow pI ranges with commensurate increases in protein<br />

loading to gain access to the lower abundant proteins in a given sample (see<br />

Note 5). In this way a more comprehensive picture of the proteomes under study<br />

can be obtained.<br />

3. Incorporate a pooled-sample mixture internal standard on every DIGE gel in<br />

a coordinated experiment. This internal standard, usually labeled with Cy2, is<br />

composed of an equal aliquot of every sample in the entire experiment, and<br />

therefore represents every protein present across all samples in an experiment. The<br />

use of this pooled-sample internal standard on every DIGE gel in a coordinated<br />

experiment allows for the facile comparison of independent sample replicates<br />

with increased statistical confidence. This experimental design also enables the<br />

simultaneous quantitative comparison between multiple variables in a coordinated<br />

experiment (Fig. 1).<br />

4. Plan out which samples will be labeled with which dyes ahead of time. For<br />

minimal dye labeling chemistry (see Subheading 3.4), each gel will contain two<br />

individual samples labeled with either Cy3 or Cy5, and an equal amount of the<br />

pooled-sample internal standard. The example outlined in Table 1 is for a twocomponent<br />

comparison repeated in triplicate, with 300 μg total protein loaded<br />

onto each of three gels. In this case, 150 μg of each sample should be precipitated<br />

(see Subheading 3.2), resuspended in L-buffer and then split 2:1. Two-thirds<br />

of each sample (100 μg) will be individually labeled with either Cy3 or Cy5.<br />

The remaining 1/3 of each sample will be pooled together and labeled with Cy2<br />

to serve as an internal standard. By following this, there will be enough of the<br />

Cy2-labeled internal standard to have an equal amount as the Cy3 or Cy5 samples<br />

loaded onto each gel. (see Note 15).<br />

3.4. CyDye Labeling<br />

All steps are performed on ice. The following protocol is for sample loading<br />

via rehydration of IPG strips, and assumes incorporation of a pooled-sample<br />

internal standard to coordinate many samples across multiple DIGE gels simultaneously.<br />

The steps are summarized in Table 1 (see Note 16).<br />

1. Resuspend precipitated sample in 24 μL labeling (L) buffer. Remove 8 μL (1/3<br />

of sample) and place into a new tube that will contain the pooled-sample internal<br />

standard (8 μL from all of the other individual samples will be pooled into this<br />

tube) (see Note 17).


110 Friedman and Lilley<br />

2. CyDyes are purchased as dry solids and should be reconstituted to 10× stock<br />

solutions (1 nmol/μL) in fresh DMF. Dilute stock solutions of CyDyes 1:10 in<br />

fresh DMF to a final working concentration of 100 pmol/μL (see Note 11).<br />

3. Label each sample (50–250 μg) with 2–4 μL (200–400 pmol) of either Cy3 or Cy5<br />

working dilution for 30 min on ice in the dark. Label the pooled-sample mixture<br />

with 2–4 μL (200–400 pmol) of Cy2 working dilution for every equivalent amount<br />

of sample present in the pooled standard as compared with the individually labeled<br />

samples. That is, if 100 μg of each sample is labeled with 200 pmol of Cy3 or<br />

Cy5, then 50 μg of each of these samples is present in the pooled standard, and<br />

200 pmol of Cy2 is used for every 100 μg of pooled standard. (see Table 1 and<br />

Note 18).<br />

4. Quench reactions with 2 μL of 10 mM lysine for 10 min on ice in the dark.<br />

5. For each gel, combine the quenched Cy3- and Cy5-labeled samples and add 1/3<br />

of the quenched Cy2-labeled pooled mixture.<br />

6. To each tripartite mixture, add an equal volume of 2× R-buffer and incubate on<br />

ice for 10 min. 2× R-buffer is R-buffer supplemented with an additional 2 mg/mL<br />

DTT using the 200 mg/mL DTT stock solution. DTT is omitted from the L-buffer<br />

to prevent unfavorable interaction with the CyDyes. Adding an equal volume of<br />

2× R-buffer to the quenched reactions provides the reducing agents to the total<br />

reaction volume at a 1× final concentration.<br />

7. Add R-buffer (1× DTT concentration) to a final volume suggested by the manufacturer<br />

for the given IPG strip length (e.g., 450 μL for 24 cm strips). Add the<br />

appropriate volume of IPG buffer ampholines to 0.5% final (v/v) for IEF. Proceed<br />

with rehydration of dehydrated IPG strips for >16 h and proceed with IEF (see<br />

Subheading 3.5.3 and Note 19).<br />

3.5. 2D Gel Electrophoresis and Poststaining<br />

As a result of the minimal labeling, quantification with the CyDyes is carried<br />

out on only 2–5% of the proteins that are labeled, and the labeled portion of<br />

the protein may migrate at a higher apparent molecular mass than the majority<br />

of the unlabeled protein due to the added mass and hydrophobicity of the dyes<br />

(exacerbated in lower M r species). To ensure that the maximum amount of<br />

protein is excised for subsequent in-gel digestion and MS, minimally labeled<br />

2D DIGE gels are poststained with a total protein stain such as SyproRuby or<br />

Deep Purple. Accurate excision is also ensured by preferentially affixing the<br />

second dimension gel to a presilanized glass plate during gel casting so that<br />

the gel dimensions do not change during the analysis (see Notes 20 and 21).<br />

These methods assume the use of the Ettan 2D electrophoresis system (GE<br />

Healthcare), but are easily adaptable to other commercially available systems.<br />

It also assumes usage of high-resolution 24 cm × 20 cm gels.<br />

1. Special gels for second dimension SDS-PAGE. Using low-fluorescence glass<br />

plates, pretreat one plate for each gel with 3–5 mL bind silane working solution,


Optimizing DIGE Technology 111<br />

carefully wiping the entire surface of the plate with a lint-free wipe. Leave treated<br />

plates covered with lint-free wipes for several hours to allow for sufficient outgassing<br />

of fumes (that may contain bind silane) before assembling gel plates and<br />

casting of second dimensional SDS-PAGE gels (see Note 22).<br />

2. Assemble plates and pour 12% homogeneous SDS-PAGE gel(s) using the appropriate<br />

amount of 30% stock acrylamide and 4× separating gel buffer for the<br />

volumes needed for the number of gels being poured (see Note 23). Overlay the<br />

gels with water-saturated butanol for several hours to provide a straight and level<br />

surface to place the focused IPG strip (see Note 10).<br />

3. Perform IEF using an IPGphor II IEF unit (GE Healthcare) of the combined<br />

tripartite-labeled samples, brought up to final volume with 1× R-buffer and<br />

passively rehydrated into IPG strips for >16 h (see Subheading 3.4.7) (see<br />

Note 24).<br />

4. Equilibrate the focused IPG strips into the second dimensional equilibration buffer.<br />

During this step, the cysteine sulfhydryls in the focused proteins are reduced<br />

and carbamidomethylated by supplementing the equilibration buffer with 1%<br />

DTT for 20 min at room temperature, followed by 2.5% iodoacetamide in fresh<br />

equilibration buffer for an additional 20 min room temperature incubation (see<br />

Note 25).<br />

5. Place equilibrated IPG strip on top of the SDS-PAGE gels that were precast with<br />

low-fluorescence glass plates. Use a thin card or ruler to carefully tamp down the<br />

IPG strip to the SDS-PAGE gel, removing air bubbles at the interface (see Notes<br />

26 and 27).<br />

6. Perform second dimensional SDS-PAGE at constant wattage, using ≪1 W/gel<br />

for at least 1 h prior to ramping up to


112 Friedman and Lilley<br />

two spot patterns, whereas most of the commercial products contain proprietary<br />

algorithms for protein spot detection, intergel matching, protein spot quantification,<br />

and even utilities for building web-based tools for data dissemination.<br />

Many include the ability to average replicate patterns into a single virtual<br />

pattern to be used in a comparative study. They are all designed to compare<br />

multiple spot patterns and quantify abundance changes for individual proteins<br />

between experimental conditions.<br />

Several software packages allow for the analysis of DIGE data. The DeCyder<br />

suite of software tools was specifically developed to support the DIGE platform<br />

when this technology was first marketed by GE Healthcare and is therefore used<br />

as an example here. The differential in-gel analysis (D I A) module of DeCyder<br />

is used for direct quantification of protein spot volume ratios between the triply<br />

codetected signals emanating from each resolved protein, and can be used for<br />

the simplest form of a DIGE experiment for pairwise comparisons with N =1.<br />

The more advanced DIGE experiments that use the internal standard to crosscompare<br />

replicate samples from pairwise and multivariable analyses (N >3)<br />

are handled by the biological variation analysis (BVA) module of DeCyder. In<br />

a BVA experiment, the signals emanating from the internal standard are used<br />

both for direct quantification within each DIGE gel in a coordinated set (using<br />

Differential In-gel Analysis (DIA) module), as well as for normalization and<br />

protein spot pattern matching between gels (see Note 31). This allows for the<br />

calculation of Student’s t-test and ANOVA statistics for individual abundance<br />

changes (see Subheading 3.6.2, and Table 2). BVA is also used to match<br />

patterns between SyproRuby- and CyDye-stained images to facilitate protein<br />

excision for subsequent MS (see Notes 20, 21, and 30).<br />

3.6.2. Experimental Design and Statistical Confidence<br />

In the simplest form of a DIGE experiment, two or three samples are<br />

separately labeled with one of the three dyes and separated in the same gel for<br />

direct pairwise comparisons. In this case, the software first normalizes the entire<br />

signal for each CyDye channel and then calculates the protein spot volume<br />

ratio for each protein pair. A normal distribution is modeled over the actual<br />

distribution of protein pair volume ratios, and two standard deviations of the<br />

mean of this normal distribution represent the 95th percent confidence level for<br />

significant abundance changes.<br />

This N = 1 type of experiment has limited statistical power, since the 95th<br />

percentile confidence interval is determined based on the overall distribution of<br />

changes within the population (see Note 32). Many more changes in abundance<br />

of much lesser magnitude can be detected with much greater statistical confidence<br />

(Student’s t-test and ANOVA, Table 2) by incorporating independent


Optimizing DIGE Technology 113<br />

Table 2<br />

Statistical Applications of DeCyder Biological Variation Analysis and Extended<br />

Data Analysis (EDA) Modules<br />

Average ratio<br />

Student’s t-test<br />

One-way ANOVA<br />

Two-way ANOVA<br />

Principle component<br />

analysis (EDA only)<br />

Hierarchical<br />

clustering (EDA only)<br />

K-means (EDA only)<br />

Self organizing maps<br />

(EDA only)<br />

Gene shaving (EDA<br />

only)<br />

Discriminant analysis<br />

(EDA only)<br />

Calculated for each protein spot feature between two groups<br />

or experimental conditions. Derived from the log standardized<br />

protein abundance changes that were directly quantified<br />

within each DIGE gel relative to the internal standard for the<br />

protein spot feature.<br />

Univariate test of statistical significance for an abundance<br />

change between two groups or experimental conditions.<br />

p-values reflect the probability that the observed change has<br />

occurred due to stochastic chance alone. With DIGE, p-values<br />

of


114 Friedman and Lilley<br />

replicate samples into the experiment (see Note 33). The number of replicates<br />

required in a study depends on the amount of variation in the system being<br />

investigated. Increasing the number of replicates will increase confidence in<br />

smaller changes in expression. The number of gel replicates that are needed for<br />

the experiment to have sufficient sensitivity to detect expression changes can<br />

be determined using power calculations (for example see (19)).<br />

With replicate samples, the Student’s t-test and ANOVA statistics are<br />

measuring the significance of the variation of a specific protein change,<br />

independent of the overall distribution of abundance changes in the population.<br />

Incorporating replicate samples into the experimental design also controls for<br />

unexpected variation introduced into the samples during sample preparation.<br />

This design not only allows for the identification of abundance changes that<br />

are consistent across multiple replicates of an experiment, but can also identify<br />

significant abundance changes that would not have been identified even if the<br />

analyses were performed using Cy3- and Cy5-labeled samples on the same<br />

gels, but without the pooled-sample internal standard to coordinate them (14).<br />

3.6.3. Multivariate Statistical Analysis<br />

Univariate analyses such as the Student’s t-test and ANOVA have traditionally<br />

been used in DIGE experiments to provide a list of statistically significant<br />

changes in protein abundance. The application of multivariate statistical<br />

analyses (as outlined in Subheading 1.2.4) allow for the assessment of<br />

changes on a global scale, and can bring added insight to the usual “list of<br />

proteins” generated. Most software packages allow for the export of raw and<br />

normalized protein spot volumes to allow for these additional statistical tests<br />

and data manipulations; in addition, the DeCyder suite of software tools now<br />

provides an Extended Data Analysis (EDA) module, that includes many of these<br />

tools (Table 2). These tools are now becoming more evident in recent DIGE<br />

publications (10,24,28,29,30,32,52). Although these multivariate analyses are<br />

especially beneficial when analyzing a DIGE experiment that contains three or<br />

more conditions, they can also useful in two-condition comparisons to detect<br />

sample outliers, fouled samples or even poor experimental design.<br />

Figure 2 illustrates an example of PCA applied to a DIGE dataset comprised<br />

of four experimental conditions each measured in quadruplicate. PCA simplifies<br />

multidimensional datasets by reducing the variation down to the two or three<br />

most significant sources of variation. In this example, the first principle<br />

component (PC1) accounts for 62.3% of the variation amongst 156 proteins<br />

of interest, with the second principle component (PC2) accounting for an<br />

additional 12.5% of the variation. Each sample datapoint describes the collective<br />

expression profile for the subset of 156 proteins, and PC1 and PC2 orthogonally


Optimizing DIGE Technology 115<br />

divide the samples into quadrants based on these two largest sources of variation<br />

within DIGE dataset. In this case, 75% of the variance between these proteins<br />

clusters the samples into the proper categories (adapted from (24)).<br />

Figure 3 is taken from a 2D DIGE study, which determined the change in<br />

protein abundance in mouse liver over a 24 h period. In this, study proteins<br />

were harvested from groups of mice on a second cycle after transfer from<br />

synchronized (12 h light:12 h dim red light) to free running conditions (constant<br />

dim red light). Proteins were extracted from each liver and pooled from six<br />

mice per 4-h time point. HC (by average distance correlation) was used to<br />

investigate the expression of 49 novel circadian proteins. This gave a range<br />

of phase groups with 10 proteins peaking during the subjective day and 39<br />

proteins distributed between two clusters, which were most abundant during<br />

the subjective night (adapted from (32)).<br />

Finally, additional information may be gleaned by mapping proteins found<br />

to be changing by DIGE to existing biological pathways and networks.<br />

Many software solutions and services are becoming available for this type<br />

of extended analysis (e.g., Kegg pathways, Ingenuity pathways analysis,<br />

WebGestalt, DeCyder EDA). Although additional validation is necessary to<br />

establish biological significance, the mapping of members of a “list of proteins”<br />

to established pathways and networks can provide validating support for the<br />

proteins observed by DIGE alone. In some cases, it can also indicate potential<br />

proteins associated with the biological question that were not accessible in the<br />

DIGE analysis. For example, Friedman et al. (10) recently reported the use of<br />

network/pathway mapping for proteins found by DIGE/MS in MCF10A cells<br />

overexpressing the HER2 receptor after treatment with TGF-. The majority of<br />

proteins identified with DIGE/MS mapped to a network of pathways involving<br />

TGF- as a major hub, but also included an intercalating pathway involving p53<br />

that effected many proteins that were independently identified in the DIGE/MS<br />

experiments. This insight linking new players to those identified with DIGE/MS<br />

led to the further investigation of a direct role for p53 in the expression of the<br />

tumor suppressor maspin (53).<br />

4. Notes<br />

1. 2DE has traditionally been a popular method for differential display proteomics<br />

on a global scale, but until recently, these strategies lacked the ability to directly<br />

quantify abundance changes in the same fashion as in stable isotope LC/MSbased<br />

strategies (2,3,4). This has been mainly due to the inability to directly<br />

correlate migration patterns and protein staining between gel separations (gelto-gel<br />

variation). Stable isotopes have been used in gel-based proteomics as<br />

well, whereby different proteomes have been separately labeled with different<br />

stable isotopes (e.g., growing cells using 14 N vs. 15 N-labeled medium) prior to


116 Friedman and Lilley<br />

mixing and running together through the same 2DE separation (5). In this case,<br />

abundance changes can be monitored during the mass spectrometry (MS) stage<br />

on individual proteins, but requires the in-gel digestion and MS on every protein<br />

present to discover the subset of proteins that is changing.<br />

2. Both hydrophobicity and molecular weight influence how proteins migrate<br />

during SDS-PAGE, yielding information on apparent molecular mass.<br />

3. In comparison, commonly used silver or colloidal coomassie blue (ca. 5–10 ng<br />

sensitivity) stains typically exhibit a dynamic range of less than two orders of<br />

magnitude (8,9). The CyDye labeling system is compatible with the downstream<br />

processing commonly used to identify proteins via MS and database interrogation,<br />

which involves the generation of tryptic peptides within excised gel<br />

plugs. Trypsin cleaves the peptide bonds the C-terminal side of lysine and<br />

arginine residues, but peptide generation is mostly unhindered as so few lysine<br />

residues are modified by dye labeling.<br />

4. DIGE experiments can still be performed using the internal standard methodology<br />

with only two CyDyes, but twice as many gels are required to analyze the<br />

same number of samples compared with the three-dye minimal labeling scheme.<br />

With saturation labeling, one dye is used to label the internal standard, and the<br />

other is used to label individual samples. A dye-swap scheme is not necessary<br />

in this case because the individual samples are always labeled with the same<br />

CyDye.<br />

5. The use of hydroxyethyl disulfide (commercially available as “DeStreak<br />

reagent”), combined with anodic cup loading, should be used for enhanced<br />

resolution for IEF above pH 8 (11).<br />

6. Running every DIGE gel with the maximal amount of protein (without adversely<br />

effecting first dimension resolution) not only enables detection of lower<br />

abundance proteins, but also provides more material for subsequent protein<br />

identification using MS. This makes every gel in a coordinated DIGE experiment<br />

a “pick-able” gel, without the need to run subsequent preparative gels<br />

with increased protein load that then have to be carefully matched to a lower<br />

abundant, analytical gel. When combined with narrow range IEF, maximizing<br />

the protein amount also allows interrogation of the lower abundant proteins in<br />

a sample.<br />

7. If one sample within a study has very skewed protein distributions compared<br />

with others, then many of the “novel proteins” within this sample will effectively<br />

be diluted out in the pool. Such a sample outlier can be easily identified using<br />

the multivariate statistical analyses described.<br />

8. Repetition not only enables the identification of subtle differences with statistical<br />

confidence, it is also vital to control for nonbiological variation. In most cases<br />

biological variation will outweigh technical variation, therefore, only biological<br />

replicates are necessary. Thus it is important that each replicate sample is derived<br />

from an independent experiment, ideally performed on different occasions as<br />

perhaps using different batches of medium. The independent samples can then be


Optimizing DIGE Technology 117<br />

analyzed coordinately using the pooled-sample internal standard methodology.<br />

See Table 1 for an example of this design.<br />

9. All solutions should be prepared using water that has a resistivity of 18.2 Mcm;<br />

this is referred to as “water” throughout the text.<br />

10. Mix equal parts of butanol and water and shake vigorously. Let the two phases<br />

separate overnight, and use the butanol phase for overlay. Butanol that is not<br />

completely water saturated can extract water from the top of the gel. A more<br />

recent improvement is to use a 0.1% SDS solution in a conventional spray bottle,<br />

used to carefully spray a fine mist over the top of the gels to thoroughly cover<br />

the top of the gel (the gel/overlay interface will not be as obvious).<br />

11. DMF can degrade, producing amines, which can react with the NHS-ester<br />

CyDyes. DMF stocks should be kept fresh (


118 Friedman and Lilley<br />

dimension due to MW and hydrophobicity shifts. Overlabeling results in side<br />

reactions with the epsilon-amine groups of lysine side chains, but since the<br />

maleimide dyes do not carry compensatory charge, this results in the overall<br />

loss of a charge, which creates a series of isoelectric forms in the first dimension<br />

(“charge trains”). Labeling buffer should not contain any components with free<br />

thiols, as these will react with the satCyDyes.<br />

17. L-buffer volume can be increased if necessary for complete resolubilization,<br />

although 100–250 μg or more should resolubilize readily in this volume. The<br />

volume of labeling buffer used for resolubilization should not exceed 40 μL per<br />

sample when using cup loading for sample entry to ensure that the final volumes<br />

will not exceed the capacity of the cup loading (ca. 100–150 μL).<br />

18. These methods are provided assuming that all gels to be run will be used both<br />

for analytical (quantification) as well as preparative (providing material for<br />

subsequent MS) purposes. Current recommendations from the manufacturer are<br />

to label 50 μg of sample with 400 pmol CyDye. Sufficient amount of unlabelled<br />

sample can be added to the quenched reactions to achieve final protein amounts<br />

to facilitate subsequent MS. Alternatively, many have found that the ratios can<br />

be adjusted to label increasing amounts of sample (up to 200 μg with 200 pmol<br />

dye) without adversely affecting the overall labeling reaction (presented here).<br />

19. If samples are to be introduced using anodic cup loading, simply bring this<br />

mixture up to 100 μL in R-buffer and proceed with cup loading. R-buffer can<br />

always be supplemented with additional DTT using the 200 mg/mL DTT stock<br />

solution. In the presence of Destreak reagent for focusing in pH ranges above pH<br />

8, the addition of equal volume 1× R-buffer should provide sufficient amount<br />

of DTT without interfering with the Destreak reagent.<br />

20. Comparison of minimally labeled protein 2D maps with unlabeled protein maps<br />

is generally not a problem, as the addition of only one dye molecule does not<br />

generally prevent the facile matching of small alterations in protein mobility<br />

between the 2- and 5%-labeled protein and the remaining unlabeled protein that<br />

will provide enough material for MS.<br />

21. Poststaining is not necessary with saturation DIGE, since an unlabeled population<br />

with potentially different migration characteristics will not exist.<br />

22. This treatment binds the gel to one of the glass plates and therefore prevents<br />

shrinking/swelling during the poststaining and protein excision processes,<br />

thereby facilitating accurate robotic protein excision. Nothing should be placed<br />

on top of wipes that are covering bind silane-treated plates, as this may leave<br />

impressions that are detected during the scanning phase. Assembly and casting<br />

too soon may create a binding surface on the opposite glass plate, preventing<br />

the gel to be subsequently poststained and picked. Automated protein excision<br />

can be facilitated for certain systems by placing fluorescent alignment reference<br />

targets on the plate, which can be performed at this stage.<br />

23. A stacking gel is not required for 2D gel electrophoresis, as the proteins are<br />

effectively “stacked” to the height of the IPG strip. SDS is also not essential in the<br />

separating gel, as the SDS associated with the proteins during the equilibration


Optimizing DIGE Technology 119<br />

step, and present in the running buffer, is sufficient (although many traditionally<br />

use it in the separating gel). Using 2× concentration running buffer in the upper<br />

buffer chamber can produce higher quality separations in some circumstances.<br />

24. Samples of similar nature should always be focused simultaneously for optimal<br />

reproducibility. Focusing programs vary for some pH gradients. A typical<br />

program for many ranges is 500 V for 500 V-h, stepping to 1000 V for 1000 V-h,<br />

followed by a final step to 8000 V until >50 V-h has been reached. Check<br />

recommendations from specific vendors.<br />

25. Volume of equilibration buffer should be large to ensure sufficient removal of<br />

ampholines and other components of the first dimensional run.<br />

26. Carefully wash out any remaining liquid on top of the SDS-PAGE gel. Prewet<br />

the IPG strip with 1× running buffer and place the strip between the gel plates<br />

with the plastic backing adhering to the inside surface of one of the glass plates.<br />

The prewetted running buffer will facilitate the manipulation of the IPG strip<br />

down the inside surface of the plate and on top of the SDS-PAGE gel.<br />

27. An agarose overlay, used by many protocols, is not absolutely necessary to<br />

ensure proper contact between the IPG strip and the second dimensional SDS-<br />

PAGE gel. Using a thin card or ruler to carefully tamp down the IPG strip to<br />

the gel is usually sufficient and removes the added problems associated with the<br />

overlay, such as trapped air bubbles in the solidified agarose.<br />

28. Running gels at less than 1 W/gel can improve resolution in the high molecular<br />

weight regions of the second dimension gel. Use wattage appropriate for the<br />

second dimensional unit being used. Many different gel units can accommodate<br />

increased power by compensating for the increased heat.<br />

29. Absorption/emission maxima in DMF are 491/506 for Cy2, 553/572 for Cy3,<br />

and 648/669 for Cy5; although care must be taken to scan in regions of each<br />

spectrum that do not contain absorbance or emission in the other spectra, which<br />

may mean using a nonmaximal region of a given spectrum.<br />

30. Comparison of the 2D spot maps between saturation-labeled samples and<br />

minimal labeled or unlabeled samples is impossible, as proteins containing<br />

multiple cysteine residues may appear as significantly larger M r species when<br />

labeled with the saturation dyes, which of course cannot be predicted without<br />

first knowing the protein identity.<br />

31. Almost all software packages for 2D electrophoresis involve matching of protein<br />

spot patterns between gels. For DeCyder, it is used in the BVA module to match<br />

the quantitative data obtained from the triply coresolved protein signals from<br />

each gel in the DIA module (where gel-to-gel variation does not come into<br />

play). Manual verification of the matching is almost always required with any<br />

software package.<br />

32. There are many “all-or-none” type of experiments where the single gel<br />

comparison may be valid, and subtle changes are not expected. Nevertheless,<br />

using independent replicates and the pooled-sample internal standard methodology<br />

is still needed to control for nonbiological sample preparation error.


120 Friedman and Lilley<br />

33. The multigel approach allows many data points to be collected for each group<br />

to be compared. Spots of interest can be selected by looking for significant<br />

change across the groups. Student’s t-test and ANOVA probability scores (p)<br />

indicate the probability that the observed change occurred due to stochastic,<br />

random events (null hypothesis). Probability values


Optimizing DIGE Technology 121<br />

9. Lilley, K.S., Razzaq, A. and Dupree, P. (2002) Two-dimensional gel<br />

electrophoresis: recent advances in sample preparation, detection and quantitation.<br />

Curr Opin Chem Biol 6(1):46–50.<br />

10. Friedman, D.B., Wang, S.E., Whitwell, C.W., Caprioli, R.M. and Arteaga, C.L.<br />

(2007) Multi-variable difference gel electrophoresis and mass spectrometry: A<br />

case study on TGF-beta and ErbB2 signaling. Mol Cell Proteomics 6:150–69.<br />

11. Olsson, I., Larsson, K., Palmgren, R. and Bjellqvist, B. (2002) Organic disulfides<br />

as a means to generate streak-free two-dimensional maps with narrow range basic<br />

immobilized pH gradient strips as first dimension. Proteomics 2(11):1630–32.<br />

12. Wolters, D.A., Washburn, M.P. and Yates, J.R. 3rd (2001) An automated multidimensional<br />

protein identification technology for shotgun proteomics. Anal Chem<br />

73(23):5683–90.<br />

13. Alban, A., David, S.O., Bjorkesten, L., Andersson, C., Sloge, E., Lewis, S. and<br />

Currie, I. (2003) A novel experimental design for comparative two-dimensional gel<br />

analysis: two-dimensional difference gel electrophoresis incorporating a pooled<br />

internal standard. Proteomics 3(1):36–44.<br />

14. Friedman, D.B., Hill, S., Keller, J.W., Merchant, N.B., Levy, S.E., Coffey, R.J.<br />

and Caprioli, R.M. (2004) Proteome analysis of human colon cancer by twodimensional<br />

difference gel electrophoresis and mass spectrometry. Proteomics<br />

4(3):793–811.<br />

15. Gerbasi, V.R., Weaver, C.M., Hill, S., Friedman, D.B. and Link, A.J. (2004) Yeast<br />

Asc1p and mammalian RACK1 are functionally orthologous core 40S ribosomal<br />

proteins that repress gene expression. Mol Cell Biol 24(18):8276–87.<br />

16. Sitek, B., Luttges, J., Marcus, K., Kloppel, G., Schmiegel, W., Meyer, H.E.,<br />

Hahn, S.A. and Stuhler, K. (2005) Application of fluorescence difference gel<br />

electrophoresis saturation labelling for the analysis of microdissected precursor<br />

lesions of pancreatic ductal adenocarcinoma. Proteomics 5(10):2665–79.<br />

17. Hu, Y., Malone, J.P., Fagan, A.M., Townsend, R.R. and Holtzman, D.M. (2005)<br />

Comparative proteomic analysis of intra- and interindividual variation in human<br />

cerebrospinal fluid. Mol Cell Proteomics 4(12):2000–9.<br />

18. Zhang, X., Guo, Y., Song, Y., Sun, W., Yu, C., Zhao, X., Wang, H., Jiang, H.,<br />

Li, Y., Qian, X., Jiang, Y. and He, F. (2006) Proteomic analysis of individual<br />

variation in normal livers of human beings using difference gel electrophoresis.<br />

Proteomics 6(19):5260–68.<br />

19. Karp, N.A., Spencer, M., Lindsay, H., O’Dell, K. and Lilley, K.S. (2005)<br />

Impact of replicate types on proteomic expression analysis. J Proteome Res 4(5):<br />

1867–71.<br />

20. Meunier, B., Dumas, E., Piec, I., Bechet, D., Hebraud, M. and Hocquette, J.F.<br />

(2007) Assessment of hierarchical clustering methodologies for proteomic data<br />

mining. J Proteome Res 6(1):358–66.<br />

21. Fodor, I.K., Nelson, D.O., Alegria-Hartman, M., Robbins, K., Langlois, R.G.,<br />

Turteltaub, K.W., Corzett, T.H. and McCutchen-Maloney, S.L. (2005) Statistical<br />

challenges in the analysis of two-dimensional difference gel electrophoresis experiments<br />

using DeCyder. Bioinformatics 21(19):3733–40.


122 Friedman and Lilley<br />

22. Karp, N., Kreil, D. and Lilley, K. (2004) Determining a significant change in<br />

protein expression with DeCyderTM during a pair-wise comparison using twodimensional<br />

difference gel electrophoresis. Proteomics 4(5):1421–32.<br />

23. Kreil, D., Karp, N. and Lilley, K. (2004) DNA microarray normalization methods<br />

can remove bias from differential protein expression analysis of 2-D difference gel<br />

electrophoresis results. Bioinformatics 20(13):2026–34.<br />

24. Friedman, D.B., Stauff, D.L., Pishchany, G., Whitwell, C.W., Torres, V.J. and<br />

Skaar, E.P. (2006) Staphylococcus aureus redirects central metabolism to increase<br />

iron availability. PLoS Pathog 2(8):e87.<br />

25. Fujii, K., Kondo, T., Yamada, M., Iwatsuki, K. and Hirohashi, S. (2006) Toward<br />

a comprehensive quantitative proteome database: protein expression map of<br />

lymphoid neoplasms by 2-D DIGE and MS. Proteomics 3:3.<br />

26. Fujii, K., Kondo, T., Yokoo, H., Yamada, T., Matsuno, Y., Iwatsuki, K. and<br />

Hirohashi, S. (2005) Protein expression pattern distinguishes different lymphoid<br />

neoplasms. Proteomics 5(16):4274–86.<br />

27. Karp, N.A., Griffin, J.L. and Lilley, K.S. (2005) Application of partial least squares<br />

discriminant analysis to two-dimensional difference gel studies in expression<br />

proteomics. Proteomics 5(1):81–90.<br />

28. Seike, M., Kondo, T., Fujii, K., Yamada, T., Gemma, A., Kudoh, S. and<br />

Hirohashi, S. (2004) Proteomic signature of human cancer cells. Proteomics<br />

4(9):2776–88.<br />

29. Suehara, Y., Kondo, T., Fujii, K., Hasegawa, T., Kawai, A., Seki, K., Beppu, Y.,<br />

Nishimura, T., Kurosawa, H. and Hirohashi, S. (2006) Proteomic signatures<br />

corresponding to histological classification and grading of soft-tissue sarcomas.<br />

Proteomics 6(15):4402–09.<br />

30. Hatakeyama, H., Kondo, T., Fujii, K., Nakanishi, Y., Kato, H., Fukuda, S. and<br />

Hirohashi, S. (2006) Protein clusters associated with carcinogenesis, histological<br />

differentiation and nodal metastasis in esophageal cancer. Proteomics 6(23):<br />

6300–16.<br />

31. Verhoeckx, K.C., Gaspari, M., Bijlsma, S., van der Greef, J., Witkamp, R.F.,<br />

Doornbos, R.P. and Rodenburg, R.J. (2005) In search of secreted protein<br />

biomarkers for the anti-inflammatory effect of beta2-adrenergic receptor agonists:<br />

application of DIGE technology in combination with multivariate and univariate<br />

data analysis tools. J Proteome Res 4(6):2015–23.<br />

32. Reddy, A.B., Karp, N.A., Maywood, E.S., Sage, E.A., Deery, M., O’Neill,<br />

J.S., Wong, G.K., Chesham, J., Odell, M., Lilley, K.S., Kyriacou, C.P. and<br />

Hastings, M.H. (2006) Circadian orchestration of the hepatic proteome. Curr Biol<br />

16(11):1107–15.<br />

33. Lee, I.N., Chen, C.H., Sheu, J.C., Lee, H.S., Huang, G.T., Yu, C.Y., Lu, F.J.<br />

and Chow, L.P. (2005) Identification of human hepatocellular carcinomarelated<br />

biomarkers by two-dimensional difference gel electrophoresis and mass<br />

spectrometry. J Proteome Res 4(6):2062–69.<br />

34. Liang, C.R., Leow, C.K., Neo, J.C., Tan, G.S., Lo, S.L., Lim, J.W., Seow, T.K.,<br />

Lai, P.B. and Chung, M.C. (2005) Proteome analysis of human hepatocellular


Optimizing DIGE Technology 123<br />

carcinoma tissues by two-dimensional difference gel electrophoresis and mass<br />

spectrometry. Proteomics 5(8):2258–71.<br />

35. Nabetani, T., Tabuse, Y., Tsugita, A. and Shoda, J. (2005) Proteomic analysis of<br />

livers of patients with primary hepatolithiasis. Proteomics 5(4):1043–61.<br />

36. Huang, H.L., Stasyk, T., Morandell, S., Dieplinger, H., Falkensammer, G., Griesmacher,<br />

A., Mogg, M., Schreiber, M., Feuerstein, I., Huck, C.W., Stecher, G.,<br />

Bonn, G.K. and Huber, L.A. (2006) Biomarker discovery in breast cancer serum<br />

using 2-D differential gel electrophoresis/ MALDI-TOF/TOF and data validation<br />

by routine clinical assays. Electrophoresis 27(8):1641–50.<br />

37. Somiari, R.I., Sullivan, A., Russell, S., Somiari, S., Hu, H., Jordan, R., George, A.,<br />

Katenhusen, R., Buchowiecka, A., Arciero, C., Brzeski, H., Hooke, J. and<br />

Shriver, C. (2003) High-throughput proteomic analysis of human infiltrating ductal<br />

carcinoma of the breast. Proteomics 3(10):1863–73.<br />

38. Nishimori, T., Tomonaga, T., Matsushita, K., Oh-Ishi, M., Kodera, Y., Maeda, T.,<br />

Nomura, F., Matsubara, H., Shimada, H. and Ochiai, T. (2006) Proteomic analysis<br />

of primary esophageal squamous cell carcinoma reveals downregulation of a cell<br />

adhesion protein, periplakin. Proteomics 6(3):1011–18.<br />

39. Zhou, G., Li, H., DeCamp, D., Chen, S., Shu, H., Gong, Y., Flaig, M.,<br />

Gillespie, J.W., Hu, N., Taylor, P.R., Emmert-Buck, M.R., Liotta, L.A.,<br />

Petricoin, E.F. 3rd and Zhao, Y. (2002) 2D differential in-gel electrophoresis for<br />

the identification of esophageal scans cell cancer-specific protein markers. Mol<br />

Cell Proteomics 1(2):117–24.<br />

40. Yu, K.H., Rustgi, A.K. and Blair, I.A. (2005) Characterization of proteins in<br />

human pancreatic cancer serum using differential gel electrophoresis and tandem<br />

mass spectrometry. J Proteome Res 4(5):1742–51.<br />

41. Wan, J., Sun, W., Li, X., Ying, W., Dai, J., Kuai, X., Wei, H., Gao, X., Zhu, Y.,<br />

Jiang, Y., Qian, X. and He, F. (2006) Inflammation inhibitors were remarkably upregulated<br />

in plasma of severe acute respiratory syndrome patients at progressive<br />

phase. Proteomics 6(9):2886–94.<br />

42. Greengauz-Roberts, O., Stoppler, H., Nomura, S., Yamaguchi, H., Goldenring, J.R.,<br />

Podolsky, R.H., Lee, J.R. and Dynan, W.S. (2005) Saturation labeling with<br />

cysteine-reactive cyanine fluorescent dyes provides increased sensitivity for<br />

protein expression profiling of laser-microdissected clinical specimens. Proteomics<br />

5(7):1746–57.<br />

43. Kondo, T., Seike, M., Mori, Y., Fujii, K., Yamada, T. and Hirohashi, S. (2003)<br />

Application of sensitive fluorescent dyes in linkage of laser microdissection and<br />

two-dimensional gel electrophoresis as a cancer proteomic study tool. Proteomics<br />

3(9):1758–66.<br />

44. Sitek, B., Potthoff, S., Schulenborg, T., Stegbauer, J., Vinke, T., Rump, L.C.,<br />

Meyer, H.E., Vonend, O. and Stuhler, K. (2006) Novel approaches to analyse<br />

glomerular proteins from smallest scale murine and human samples using DIGE<br />

saturation labelling. Proteomics 3:3.<br />

45. Tetu, B., Lacasse, B., Bouchard, H.L., Lagace, R., Huot, J. and Landry, J. (1992)<br />

Prognostic influence of HSP-27 expression in malignant fibrous histiocytoma:


124 Friedman and Lilley<br />

a clinicopathological and immunohistochemical study. Cancer Res 52(8):<br />

2325–28.<br />

46. Wessel, D. and Flugge, U.I. (1984) A method for the quantitative recovery of<br />

protein in dilute solution in the presence of detergents and lipids. Anal Biochem<br />

138(1):141–43.<br />

47. Knowles, M.R., Cervino, S., Skynner, H.A., Hunt, S.P., de Felipe, C., Salim, K.,<br />

Meneses-Lorente, G., McAllister, G. and Guest, P.C. (2003) Multiplex proteomic<br />

analysis by two-dimensional differential in-gel electrophoresis. Proteomics<br />

3:1162–71.<br />

48. Prabakaran, S., Swatton, J.E., Ryan, M.M., Huffaker, S.J., Huang, J.J., Griffin, J.L.,<br />

Wayland, M., Freeman, T., Dudbridge, F., Lilley, K.S., Karp, N.A., Hester, S.,<br />

Tkachev, D., Mimmack, M.L., Yolken, R.H., Webster, M.J., Torrey, E.F. and<br />

Bahn, S. (2004) Mitochondrial dysfunction in schizophrenia: evidence for compromised<br />

brain metabolism and oxidative stress. Mol Psychiatry 9(7):684–97.<br />

49. Wang, D., Jensen, R., Gendeh, G., Williams, K. and Pallavicini, M.G. (2004)<br />

Proteome and transcriptome analysis of retinoic acid-induced differentiation of<br />

human acute promyelocytic leukemia cells, NB4. J Proteome Res 3(3):627–35.<br />

50. Zhang, W. and Chait, B.T. (2000) ProFound: an expert system for protein<br />

identification using mass spectrometric peptide mapping information. Anal Chem<br />

72(11):2482–89.<br />

51. Zhang, Y.Q., Matthies, H.J., Mancuso, J., Andrews, H.K., Woodruff, E. 3rd,<br />

Friedman, D. and Broadie, K. (2004) The Drosophila fragile X-related gene<br />

regulates axoneme differentiation during spermatogenesis. Dev Biol 270(2):<br />

290–307.<br />

52. Yokoo, H., Kondo, T., Fujii, K., Yamada, T., Todo, S. and Hirohashi, S. (2004)<br />

Proteomic signature corresponding to alpha fetoprotein expression in liver cancer<br />

cells. Hepatology 40(3):609–17.<br />

53. Wang, S.E., Narasanna, A., Whitell, C.W., Wu, F.Y., Friedman, D.B. and<br />

Arteaga, C.L. (2007) Convergence of P53 and TGFbeta signaling on activating<br />

expression of the tumor suppressor gene maspin in mammary epithelial cells. J<br />

Biol Chem 4:4.


7<br />

MALDI/SELDI Protein Profiling of Serum<br />

for the Identification of Cancer Biomarkers<br />

Lisa H. Cazares, Jose I. Diaz, Rick R. Drake, and O. John Semmes<br />

Summary<br />

The ability to visualize the full depth of the serum proteome in a high-throughput<br />

manner is a major goal of clinical proteomics. Methodologies, which combine higher<br />

throughput with the ability to observe differential protein expression levels, have been<br />

applied to this goal. An example of such a system is the coupling of robotic sample<br />

processing to matrix-assisted laser desorption time of flight mass spectrometry (MALDI-<br />

TOF-MS). Within this paradigm is a modification of MALDI-TOF termed surfaceenhanced<br />

laser desorption/ionization-TOF (SELDI-TOF). Both conventional MALDI and<br />

SELDI have been used to generate protein expression profiles reflective of potential<br />

peptide changes in serum. This information can be used to identify proteins, which may<br />

enable new diagnostic and therapeutic strategies.<br />

Key Words: matrix-assisted laser desorption ionization; surface-enhanced laser<br />

desorption ionization; mass spectrometry; protein profiling; proteomics.<br />

1. Introduction<br />

Mining the serum proteome for the discovery of new biomarkers is<br />

a major goal of many clinical proteomics efforts. Surface-enhanced laser<br />

desorption/ionization (SELDI) and matrix-assisted laser desorption ionization<br />

(MALDI) have been used extensively for protein profiling in efforts to discover<br />

biomarkers in serum from cancer patients including prostate, lung, head and<br />

neck, ovarian, and colon (1,2,3,4,5,6). MALDI techniques usually require some<br />

up-front fractionation of the serum to reduce the complexity of the sample<br />

(7,8,9) and the ease of use in sample fractionation is considered an advantage<br />

From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />

Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />

125


126 Cazares et al.<br />

in SELDI. An advantage of MALDI-TOF instrumentation is the improved<br />

resolution over SELDI instruments and the ability to directly identify peaks<br />

of interest by analyzing samples in TOF/TOF mode. For routine linear mode<br />

profiling both types of instrumentation give similar results with human serum<br />

(see Fig. 1).<br />

Besides the instrumentation and methodologies related to mass spectrometry<br />

analysis, the quality and quantity of the clinical samples to be tested is an<br />

important consideration. Serum is one of the most common sample types<br />

used in biomarker discovery, because it is routinely obtained in the clinic, a<br />

large proportion of blood clotting factors are removed, and it is a rich source<br />

of molecules that may indicate systemic function. Blood plasma is an alternative<br />

source; however, clinical plasma collection utilizes various anticoagulants,<br />

which should be standardized to allow for universal analysis. Whether<br />

serum or plasma is used, every effort to standardize the sample collection and<br />

processing protocols should be made. Several studies have highlighted this<br />

and determined that multiple factors can affect the resulting spectra generated<br />

from serum samples (10,11). These factors include the elapsed time between<br />

venipuncture and separation of plasma and serum, type of serum collection tube,<br />

5904.6<br />

A.<br />

4212.3<br />

Bruker IMAC Cu 2+ beads<br />

3266.1<br />

2663.4<br />

5337.6<br />

7762.3<br />

9282.0<br />

Ciphergen IMAC Cu 2+ chip<br />

B.<br />

Three primary peaks<br />

used for instrument<br />

standardization<br />

1<br />

2 3<br />

4000 6000 8000 10,000<br />

Fig. 1. Comparison of SELDI and MALDI spectra using QC sera. (A) MALDI<br />

spectra generated using QC processed with IMAC Cu 2 magnetic beads. (B) SELDI<br />

spectra from QC sera processed on IMAC Cu 2 chips. The three peaks used for instrument<br />

optimization are indicated.


MALDI/SELDI Protein Profiling of Serum 127<br />

storage conditions, and the number of freeze thaw cycles. In our laboratory, we<br />

routinely use serum for proteomic profiling. The following protocols outline<br />

our method for collection and storage of serum samples for subsequent analysis<br />

via MALDI-MS.<br />

Reduction of sample complexity is an essential step in the generation of<br />

high quality TOF mass spectrometry data from serum. One method of MALDI<br />

sample preparation that reduces the complexity of serum while remaining<br />

robust and easily amenable to automated high throughput applications is sample<br />

fractionation using magnetic beads (MBs) combined with prestructured MALDI<br />

sample supports (AnchorChip technology). Several MB types with different<br />

surface chemistries can be used to fractionate serum and increase the number<br />

of detectable peaks (12) (see Fig. 2). In addition, depletion of high abundant<br />

203 total unique peaks mass range 1000–10000<br />

Intens. [a.u.]<br />

Intens. [a.u.]<br />

Intens. [a.u.]<br />

Intens. [a.u.]<br />

×10 4<br />

1.50<br />

1.25<br />

1.00<br />

0.75<br />

0.50<br />

0.25<br />

0.00<br />

×10 4<br />

1.5<br />

1.0<br />

0.5<br />

0.0<br />

×10 4<br />

2.0<br />

1.5<br />

1.0<br />

0.5<br />

1016.7<br />

1208.2<br />

1208.4<br />

1361.7<br />

1547.4<br />

1733.8<br />

1946.3<br />

1467.9<br />

1468.2<br />

1706.7<br />

1790.0<br />

1947.3<br />

2014.4<br />

2210.4<br />

2212.7<br />

2382.4<br />

2557.1<br />

2662.3<br />

2663.4<br />

2607.0<br />

2954.5<br />

2955.5<br />

2935.8<br />

3265.3<br />

3266.1<br />

3266.2<br />

3509.3<br />

3450.4<br />

3884.8<br />

4092.6<br />

3885.1<br />

4093.7<br />

3956.9<br />

4211.0<br />

4212.3<br />

4212.5<br />

4644.4<br />

4646.1<br />

4644.5<br />

4964.9<br />

4965.0<br />

5336.0<br />

5337.6<br />

5337.1<br />

5902.8<br />

5904.6<br />

5903.7<br />

6087.8<br />

6086.8<br />

WCX = 84 peaks<br />

6627.7<br />

6432.2<br />

6629.9<br />

6430.7<br />

6628.3<br />

7759.4<br />

IMAC = 85 peaks<br />

7762.3<br />

8138.3<br />

8923.8<br />

8927.5<br />

9278.1<br />

9282.0<br />

C18 = 62 peaks<br />

0.0<br />

×10 4<br />

1.50<br />

1.25<br />

WAX = 80 peaks<br />

1.00<br />

0.75<br />

0.50<br />

0.25<br />

0.00<br />

1000 2000 3000 4000 5000 6000 7000 8000 9000 10000<br />

1262.7<br />

1548.2<br />

2107.5<br />

2607.4<br />

3446.9<br />

4055.2<br />

4211.2<br />

4469.5<br />

4757.7<br />

5062.3<br />

6170.3<br />

6429.1<br />

6627.1<br />

6876.9<br />

7760.4<br />

7758.7<br />

8135.9<br />

7916.4<br />

8134.1<br />

8907.0<br />

9124.8<br />

8905.0<br />

9121.1<br />

9414.3<br />

9411.6<br />

m/z<br />

Fig. 2. MALDI spectra of serum fractionated with magnetic beads. Example of<br />

spectra produced on the Ultraflex-TOF/TOF when serum is fractionated with different<br />

magnetic bead types. A total of 203 unique peaks are resolved in the m/z range of<br />

1000–10,000.


128 Cazares et al.<br />

proteins such as albumin and IgG (13,14) serves to reduce ion suppression<br />

phenomena as well as to reveal less abundant species. Unfortunately, fractionation<br />

greatly increases the number of samples to be processed, which in<br />

turn increases the complexity of the experimental procedure. Processing of<br />

samples is, therefore, best facilitated by the use of robotics, which increases<br />

throughput and produces reproducible results, however, manual processing of<br />

small sample sets can be accomplished with careful attention to detail, and the<br />

protocols and methods contained in this chapter. Another caveat to depletion<br />

strategies is that highly abundant proteins such as albumin inadvertently bind<br />

low abundant species (15,16). For comprehensive biomarker discovery, the<br />

benefits of depletion and fractionation often outweigh these factors. We have<br />

used both depleted and nondepleted serum strategies for biomarker discovery,<br />

and this continues to be a major area of methodological development.<br />

2. Materials<br />

2.1. Serum Collection and Storage<br />

1. Becton Dickinson vacutainer serum separator tube (SST) plus blood collection<br />

tube (16 mm×100 mm, draw volume 8.5 mL) (Becton Dickenson #367988)<br />

2. Screw cap microtubes for cryo-storage (2.0 mL) (Sarstedt Inc.# 72.609.001, with<br />

caps # 65.716)<br />

3. Microcentrifuge tubes for aliquots (1.7 mL) (Corning-Costar #3620)<br />

2.2. Serum Processing for MALDI Using MB-Based Fractionation<br />

1. The MB kit(s) (immobilized metal affinity-Cu, hydrophobic interaction, weak<br />

cationic, or weak anionic exchange) (Bruker Daltonics, Billerica, MA)<br />

2. Optional: ClinProt robotic workstation (Bruker Daltonics)<br />

3. Magnetic separators for manual processing: large (1.5 mL) or small tube (0.5 mL)<br />

format (Bruker Daltonics)<br />

4. -Cyano-4-hydroxycinnamic acid (CHCA) (Bruker Daltonics)<br />

5. Ethanol ultra pure 100%<br />

6. Acetone ultra pure 100%<br />

7. Micropipette capable of delivering 1 μL accurately<br />

8. Peptide standard mix (Bruker)<br />

9. Microtiter plate AnchorChip 600/384 MALDI target 600 μm diameter (Bruker<br />

Daltonics)<br />

2.3. Serum Processing for SELDI<br />

1. Water high performance liquid chromatography (HPLC) grade (Fisher Scientific,<br />

Hampton, NH)


MALDI/SELDI Protein Profiling of Serum 129<br />

2. Copper sulfate, anhydrous (Sigma-Aldrich, St. Louis, MO)<br />

3. Sodium acetate trihydrate salt<br />

4. Phosphate buffered saline (PBS) buffer pH 7.4<br />

5. Urea, at least 99% pure (Promega Madison, WI)<br />

6. CHAPS ultra purity (Fisher Scientific)<br />

7. Sinapinic acid (SPA) (5 μg tube)(Ciphergen Biosystems, Palo Alto, CA)<br />

8. IMAC protein chip arrays (Ciphergen)<br />

9. Bioprocessor holder (Ciphergen) for the processing or 12 chips in a 96-well<br />

format<br />

10. Bioprocessor accessory, 96-well disposable reservoir and gasket (Ciphergen)<br />

11. Acetonitrile ultra high purity grade<br />

12. Trifluoroacetic acid (TFA) (100%, 1 mL ampules) [Sigma/Aldrich Chemical<br />

Company 26,977-8, (589-37-37)]<br />

13. Plate seals<br />

14. For calibration: (all from Ciphergen biosystems) NP20 ProteinChip arrays Allin-one<br />

peptide standard All-in-one protein standard<br />

15. Optional: BioMek 2000 robotic workstation, adapted to process ProteinChip<br />

arrays (Ciphergen biosystems)<br />

15. DPC MicroMix 5 shaker (Diagnostic Products Corporation, Los Angeles, CA)<br />

or another type of rotary or platform shaker<br />

16. Micropipet capable of delivering 1 μL accurately<br />

17. Pooled serum for quality control (QC)<br />

18. 100 mM CuSO 4 in water [room temperature (RT)]: 1.6 g CuSO4 (MW = 159.6)<br />

made up to 100 mL in HPLC grade water<br />

19. 100 mM sodium acetate, pH 4.0 (RT): 9.0 mL 0.2 M sodium acetate stock<br />

(27.2 g/L), 50 mL HPLC water, 41.0 mL 0.2 M acetic acid (add gradually to<br />

get to pH 4.0) (11.6 mL/L made from concentrated).<br />

20. The PBS Buffer pH 7.4 (RT): 10 mL PBS Buffer (10) made up to 100 mL in<br />

HPLC water. Check pH.<br />

21. 10% TFA stock: 1 mL TFA (100%), 9 mL HPLC water (store in amber bottle)<br />

22. 1% TFA working solution (store in amber bottle and make fresh every 2 weeks):<br />

take 1 mL TFA (10%) and add 9 mL HPLC water<br />

23. 8 M Urea, 1% CHAPS in PBS, pH 7.4: 48.05 g Urea, up to 90 mL PBS pH<br />

7.4; stir until dissolved, may need warming. Add 1 g CHAPS. Bring the final<br />

volume to 100 mL with PBS. Filter through 0.4 μm filter. Aliquot into 5 mL<br />

volumes and freeze.<br />

24. 1 M Urea, 0.125% CHAPS in PBS, pH 7.4: dilute the 8 M stock above in PBS<br />

(100 mL 8Min700mLPBS).<br />

2.4. SELDI and MALDI Spectra Acquisition<br />

1. SELDI PBS II, IIc, or PCS 4000 instrument (Ciphergen biosystems)<br />

2. Ultraflex I or II MALDI-TOF–TOF (Bruker Daltonics)


130 Cazares et al.<br />

3. Method<br />

3.1. Serum Collection<br />

Obtain proper patient consent:<br />

1. Perform venipuncture into a 10 cc SST vacutainer tube (without anticoagulant).<br />

2. Allow blood to clot at RT for 30 min.<br />

3. Spin blood at 1700 rcf for 10 min, immediately decant and freeze serum at –70°C<br />

in a screw cap freezer vial (Sarstedt). If this is not possible, the serum can be<br />

stored at –20 for 5 days, before moving to a –70 freezer.<br />

4. Prior to SELDI or MALDI analysis, the sample should be thawed and divided<br />

into small volume aliquots to avoid multiple freeze thaws. When possible, no<br />

sample should be taken through more than two freeze thaw cycles, and the number<br />

of freeze/thaw cycles should be recorded if unused volumes are returned to the<br />

freezer.<br />

3.2. Preparation of Human Serum<br />

Expression profiling of proteins/peptides utilizes both peak mass and<br />

intensity to quantify changes in differential spectra. This necessitates the use<br />

of a QC standard to monitor instrument performance (17). The QC sample<br />

routinely used in our lab is pooled human serum collected using the same<br />

serum collection protocol used to collect (see above SOP) the experimental<br />

samples. Efforts have been made to develop a standardized QC sample for<br />

serum mass spectrometry profiling (18). However, until that end, a large volume<br />

of serum can be pooled and aliquoted to be run with every experimental<br />

sample set. This QC sample should be assayed using the same processing<br />

technique, which will be employed for the experimental samples and the data<br />

from multiple runs analyzed. In this way, the inter- and intra-assay variability<br />

can be determined. Additionally, the spectra obtained from the QC sample can<br />

be used as a benchmark for the integrity of processing, instrument optimization,<br />

and ProteinChip variability. We, therefore, recommend including several QC<br />

samples on a MALDI target and one QC spot on each SELDI ProteinChip.<br />

Acceptable levels of reproducibility need to be established for any new<br />

technology, and sample preparation is the most critical step to the production<br />

of reproducible spectra (see Notes 3, 4, and 5). We have optimized the SELDI<br />

system with high-throughput robotics, and previous studies in our laboratory<br />

have determined that the mass accuracy of SELDI spectra is highly reproducible<br />

with CV’s of 0.05%. Operating in linear mode, we have found the mass accuracy<br />

of an Ultraflex-TOF–TOF to be 0.01% CV. Overall normalized intensity values<br />

for individual peaks using QC sera are routinely below a 20% CV for samples<br />

prepared robotically in our lab using either SELDI or MALDI-MS.


MALDI/SELDI Protein Profiling of Serum 131<br />

3.3. Serum Protein Profiling on the MALDI-TOF–TOF<br />

3.3.1. MB Fractionation of Human Serum<br />

These steps are performed by the ClinProt robot. Below is an outline of<br />

a comparable manual method. Sequential fractionation can also be performed<br />

with multiple bead types.<br />

1. Vortex MBs thoroughly for at least 1 min.<br />

2. In a 0.5 mL eppendorf, pretreat 5 μL of MBs with 50 μL MB-IMAC Cu binding<br />

solution.<br />

3. Place the tube in the magnetic bead separator (MBS) and move it between<br />

adjacent wells 10 times.<br />

4. Collect the beads on the wall of the tube for 20 s and remove the supernatant<br />

carefully with a pipette.<br />

5. Repeat this pretreatment two more times.<br />

6. Add 20 μL of serum and mix carefully with the beads by pipetting up and down<br />

five times.<br />

7. Keep at RT for 2 min.<br />

8. Place the tube in the MBS and wait for 20 s for beads to separate.<br />

9. Remove the supernatant with a pipette tip carefully (the unbound fraction can<br />

be discarded or saved for analysis or a second fractionation step, if desired).<br />

10. To wash, add 80 μL MB-IMAC Cu wash solution and place tube in the MSB<br />

again. Move the tube back and forth to adjacent wells 10 times.<br />

11. Collect the beads on the tube wall for 20 s and remove the supernatant carefully<br />

with a pipette.<br />

12. Repeat this wash two more times.<br />

13. To elute, add 10 μL MB-IMAC Cu elution solution and mix. Let the beads sit<br />

for 5 min at RT.<br />

14. Place the tube on the MBS and wait 20 s for beads to separate.<br />

15. Transfer the eluate to a fresh tube.<br />

3.3.2. Data Collection on MALDI-TOF–TOF Instrument<br />

To best detect proteins over the entire mass range on a MALDI instrument,<br />

it is necessary to optimize the instrument settings for both low mass (typically<br />

2000–20,000 Da) and high mass (20,000–100,000 Da or greater). The best<br />

sensitivity and resolution is in the mass range below m/z 20,000, and this is the<br />

mass range we routinely use for most profiling experiments.<br />

1. Prepare samples on an anchor plate by making dilutions of the eluates of 1:10<br />

in CHCA matrix prepared according to the anchor chip protocol (0.3 mg/mL in<br />

ethanol:acetone 2:1). SPA and/or 2,5-dihydroxybenzoic acid may also be used.<br />

2. Spot 1 μL of the sample diluted in matrix onto the 600 μm diameter AnchorChip<br />

target. Also spot 1 μL of the peptide standard diluted according to the manufacturer’s<br />

instructions.


132 Cazares et al.<br />

3. Allow spots to dry.<br />

4. Perform external calibration with the peptide standard using a linear mode method.<br />

5. Collect at least 300 shots in linear mode, adjusting the laser energy and detection<br />

sensitivity to maximize signal and resolution of the major peaks using a QC spot.<br />

Typically, in linear mode the resolution of the three major peaks should be greater<br />

than 600.<br />

6. Instrument settings will vary based on instrument set-up, and are more numerous<br />

that is feasible to describe in this book chapter but the most important settings to<br />

optimize are acceleration voltage (IS1), laser power, time lag focusing (or PIE),<br />

detector settings, and matrix suppression. Our basic instrument settings in linear<br />

mode are as follows:<br />

IS1, 22<br />

Laser, 37% with laser attenuation offset at 48%, range at 40%<br />

Time lag focus, 200 ns<br />

Detector Gain, 24×<br />

Matrix suppression, gated with suppression up to m/z 800<br />

All spectra should be processed using the same baseline subtraction protocol.<br />

Perform peak detection using a uniform definition of requisite signal-to-noise<br />

ratio and mass window. Although MALDI techniques have the potential to<br />

produce protein profiles that contain patterns capable of distinguishing disease<br />

and identifying biomarkers, a single analysis may produce many hundreds of<br />

protein peaks (see Note 2). Therefore, the data analysis required to discern<br />

the differentiating patterns poses a major challenge, and the analysis and interpretation<br />

of the enormous volumes of proteomic data remains an unsolved<br />

bioinformatics challenge. Many different classification tools are currently being<br />

used with success for the analysis of MALDI data. These approaches include<br />

Fisher discriminative analysis, CART (19,20), support vector machine (21),<br />

artificial neural network (22), boosted decision tree analysis (23), and genetic<br />

algorithm (24). General considerations for data preparation before any type<br />

of analysis should include averaging intensity values for duplicate samples,<br />

baseline subtraction, and peak picking.<br />

3.4. Protein Identification Using MALDI-TOF/TOF<br />

Biomarker candidates detected by protein profiling can be subjected to<br />

TOF/TOF analysis for the identification of peptides directly from serum profiles<br />

using the same sample spot and/or respotting of the sample. Initial analysis in<br />

the reflectron mode will allow for visualization of the target or parent peak.<br />

Metastable fragment ions of the respective precursor ion are then analyzed after<br />

a second acceleration step, and the resulting fragment pattern is interpreted and


MALDI/SELDI Protein Profiling of Serum 133<br />

Peptide <strong>View</strong><br />

MS/MS Fragmentation of DSGEGDFLAEGGGVR<br />

Found in gi|229185, fibrinopeptide A<br />

Start - End<br />

2 - 16<br />

Observed<br />

1465.72<br />

Mr(expt)<br />

1464.72<br />

Mr(calc)<br />

1464.65<br />

Delta<br />

0.07<br />

Miss<br />

0<br />

Sequence<br />

DSGEGDFLAEGGGVR<br />

Matched peptides shown in Bold Red<br />

1 ADSGEGDFLA EGGGVR<br />

×10 4<br />

3<br />

A<br />

1468.0<br />

1SLin, Baseline subtracted<br />

Intens. [a.u.]<br />

2<br />

1<br />

0<br />

1208.2<br />

1352.5<br />

1868.2<br />

1619.0<br />

2675.6<br />

1780.8 2024.5<br />

2297.6<br />

2557.2<br />

1200 1400 1600 1800 2000 2200 2400 2600 m/z<br />

C<br />

B<br />

Fig. 3. Identification of a serum peptide directly from the serum profile. Serum<br />

profile (A) was generated in linear mode on the Ultraflex-TOF/TOF, from which a<br />

peptide (m/z 1469.09) was selected for MS/MS analysis resulting in a fragmentation<br />

spectra (B). This peptide showed homology to fibrinopeptide A using the Mascot search<br />

engine (C).<br />

used for peptide identification via database search. The possibility to directly<br />

sequence the peptides of interest is a powerful feature of this method (see Fig. 3).<br />

3.5. Serum Protein Profiling on SELDI-TOF<br />

3.5.1. Preparation of Serum<br />

Note: All of the following steps including the ProteinChip preparation and<br />

serum incubation on the arrays are performed robotically by the BioMek 2000<br />

robot. The protocols below outline a manual method.<br />

1. Thaw human serum samples on ice. Use separate aliquots to set up duplicates or<br />

triplicates.<br />

2. Add 20 μL human serum into a 1.7 mL microcentrifuge tube (alternatively, this<br />

can be performed in a v-bottom 96-well plate for large sample sets).


134 Cazares et al.<br />

3. Add 30 μL of 8 M Urea, 1% CHAPS in PBS pH 7.4.<br />

4. Vortex tube at 4°C for 10 min or if using a plate, seal and place on MicroMix 5<br />

shaker at 4°C for 10 min: shaker settings: form 20, amplitude 5, time 10 min.<br />

5. Add 100 μL 1 M Urea, 0.125% CHAPS in PBS pH 7.4.<br />

6. Vortex or pipette up and down to mix (total volume 150 μL).<br />

7. Dilute sample 1:5 in PBS pH 7.4 by adding 600 μL PBS. If using a plate, remove<br />

35 μL of serum–urea mixture from first plate and transfer to a second plate. Then<br />

add 140 μL of PBS. Mix by vortexing tube or pipetting up and down.<br />

8. Store on ice until ready to add samples to a bioprocessor containing ProteinChip<br />

arrays.<br />

3.5.2. Preparation of ProteinChip Arrays<br />

This protocol describes the preparation of IMAC-Cu 2+ ProteinChips. Other<br />

types of chips should be prepared according to the manufacturer’s (Ciphergen)<br />

instructions.<br />

1. Label or number IMAC chips on the reverse side and place them into the<br />

bioprocessor according to the manufacturer’s instructions. (see Note 1)<br />

2. Add 50 μL of 100 mM CuSO 4 onto each spot or array.<br />

3. Shake on Micromix 5 for 10 min at RT.<br />

4. Shaker settings: form 20, amplitude 5, time 10 min<br />

5. Flick plate to remove CuSO 4 to waste and pat upside down onto a clean paper<br />

towel to remove residual liquid (liquid can also be removed by aspiration, but<br />

be careful no to touch array surface with pipette tip).<br />

6. Wash with 200 μL of HPLC water 2 min × 5 min at RT on Micromix shaker at<br />

the same settings for form and amplitude as before.<br />

7. Flick plate and pat on paper towel.<br />

8. Add 50 μL of 100 mM sodium acetate pH 4.0.<br />

9. Shake on Micromix shaker for 5 min at RT.<br />

10. Flick plate and pat as before.<br />

11. Wash with HPLC water 2 min × 5 min at RT on Micromix.<br />

12. Add 200 μL PBS pH 7.4.<br />

13. Flick plate and pat as before.<br />

14. Wash with PBS pH 7.4 2 min × 5 min at RT on Micromix.<br />

Leave last volume of PBS on plate until ready to use.<br />

3.5.3. Incubation of Serum on ProteinChip Arrays<br />

1. Remove PBS from bioprocessor with multichannel pipettor, one row at a time<br />

to avoid drying chips.<br />

2. Add 100 μL of each sample to respective arrays. Note: samples should be<br />

randomized as to their placement on the ProteinChip arrays. Duplicate samples<br />

should also be randomly placed.


MALDI/SELDI Protein Profiling of Serum 135<br />

3. Seal plate and shake bioprocessor on micromix (form 20, amplitude 5) for<br />

30 min at RT.<br />

4. Remove samples carefully with a pipette, changing tips to avoid cross contamination.<br />

5. Add 200 μL PBS pH 7.4 to each array and shake on micromix for 5 min at RT<br />

using same shaker settings.<br />

6. Remove PBS with multichannel pipettor changing tips for each row.<br />

7. Wash with 200 μL HPLC water, shake on micromix for 5 min at RT.<br />

8. Remove water with multichannel pipettor.<br />

9. Repeat water wash.<br />

10. Remove chips from bioprocessor and allow chips to dry completely.<br />

3.5.4. Adding SPA Matrix to the Chips<br />

1. To one tube of SPA, add 200 μL acetonitrile (100%).<br />

2. Add 200 μL 1% TFA (final concentration of SPA:12.5 mg/mL in 50% acetonitrile,<br />

50% 0.5% TFA).<br />

3. Vortex for 5 min at RT.<br />

4. Quick spin.<br />

5. Add 1.0 μL SPA matrix to each dry spot, being careful not to touch the pipette<br />

tip to the array surface.<br />

6. Allow to dry.<br />

7. Arrays are now ready to read on the SELDI instrument. Note: The arrays should<br />

be stored in the dark in a cool dry place. It is recommended to read the chips<br />

within a few hours of the addition of the matrix. Some signal degradation may<br />

occur if the arrays are stored for more than 24 h).<br />

3.5.5. Collection of Spectra on SELDI-TOF<br />

We describe here the collection of spectra using the PBS II Ciphergen<br />

instrument.<br />

3.5.5.1. Calibration<br />

Calibration of the SELDI instrument is crucial to the accurate mass analysis<br />

of the proteins present in samples. Smaller ions fly faster than larger ions, and<br />

their m/z ratio can be calculated from their flight time using compounds of<br />

known mass. For the most accurate mass assignments, the instrument should be<br />

calibrated using conditions identical to the experimental conditions. Calibration<br />

should be performed at the beginning of an experimental run, and thereafter<br />

everyday the experimental data is collected. When obtaining calibration spectra,<br />

use instrument settings as close to the settings used for serum profiling (i.e.,<br />

detector voltage, lag time, etc.) as possible.<br />

1. Reconstitute one vial each of the seven-in-one peptide and protein standards,<br />

according to the manufacturer’s instructions. Aliquot and freeze.


136 Cazares et al.<br />

2. Mix standards with SPA according to package insert.<br />

3. Deposit 1 μL of each standard onto an array of an NP20 ProteinChip.<br />

4. Air-dry the arrays completely, usually 30–60 min.<br />

5. Read the array in the SELDI instrument using a spot protocol created to read<br />

the experimental samples (see below). The laser intensity should be lowered<br />

such that the peaks from the standards do not exceed 75% maximum signal<br />

intensity.<br />

6. Follow the calibration dialogue in the software of the PBSII SELDI instrument<br />

to save the calibration equations.<br />

3.5.5.2. SELDI Instrument Settings Optimization<br />

The SELDI instrument optimization refers to the adjustment of settings<br />

necessary for data collection, which will maximize signal intensity while<br />

retaining the optimal resolution and the lowest noise. In our studies, there are<br />

three consistently present protein peaks (m/z 5900, 7764, 9284 ± 0.2%) in the<br />

QC sera processed on IMAC-Cu 2+ ProteinChips, which are used as benchmarks<br />

for instrument optimization (see Fig. 1). Based on multiple runs, the<br />

instrument settings are adjusted to maximize signal to noise and resolution for<br />

these three peaks. Thereafter specific criteria were set to ensure instrument<br />

optimization (refer to paper Semmes et al. (17)). Generally, when trying to<br />

obtain a specific overall intensity level (e.g., to get two instruments to behave<br />

similarly, or to obtain similar intensity levels over time), three parameters can<br />

be adjusted. These include laser intensity, detector sensitivity, and detector<br />

voltage. The following spot protocols for data collection on the SELDI reader<br />

are a starting point. The settings will be different from instrument to instrument<br />

and will change over time, based on cumulative laser utilization and detector<br />

settings.<br />

Data collection: standard spot protocol for QC serum on IMAC-Cu (for a<br />

PBSII)<br />

1. Set detector voltage to 1650.<br />

2. Set high mass to 100,000 Da, optimized from 3000 to 50,000 Da.<br />

3. Set starting laser intensity to 220.<br />

4. Set starting detector sensitivity to 7.<br />

5. Focus lag time at 900 ns.<br />

6. Set data acquisition method to SELDI quantitation.<br />

7. Set SELDI acquisition parameters 20 delta to 4 transients per to 12 ending position<br />

to 80.<br />

8. Set warming positions with two shots at intensity 230 and do not include warming<br />

shots.<br />

When adjusting to meet QC criteria:


MALDI/SELDI Protein Profiling of Serum 137<br />

• Increasing detector voltage typically increases signal and noise. Change this in units<br />

of 25 V.<br />

• Increasing laser increases signal and generally decreases resolution. Change this in<br />

units of 10.<br />

• Increasing sensitivity increases signal intensity. Typical working range is six to eight.<br />

For example, if the settings above are not meeting QC specifications, try the<br />

following:<br />

If S/N passes easily but resolution is low, reduce detector voltage or laser<br />

intensity:<br />

1. Set detector voltage to 1625.<br />

2. Set high mass to 100,000 Da, optimized from 3000 to 50,000 Da.<br />

3. Set starting laser intensity to 220.<br />

4. Set starting detector sensitivity to 7.<br />

5. Focus lag time at 900 ns.<br />

6. Set data acquisition method to SELDI quantitation.<br />

7. Set SELDI acquisition parameters 20 delta to 4 transients per to 12 ending position<br />

to 80 (192 total shots).<br />

8. Set warming positions with two shots at intensity 230 and do not include warming<br />

shots.<br />

If resolution passes but S/N is low increase laser intensity or detector voltage:<br />

1. Set detector voltage to 1650.<br />

2. Set high mass to 100,000 Da, optimized from 3000 to 50,000 Da.<br />

3. Set starting laser intensity to 230.<br />

4. Set starting detector sensitivity to 7.<br />

5. Focus lag time at 900 ns.<br />

6. Set data acquisition method to SELDI quantitation.<br />

7. Set SELDI acquisition parameters 20 delta to 4 transients per to 12 ending position<br />

to 80.<br />

8. Set warming positions with two shots at intensity 230 and do not include warming<br />

shots.<br />

If intensity is too high (i.e., generally stay under 65), reduce laser intensity<br />

and/or sensitivity:<br />

1. Set detector voltage to 1650.<br />

2. Set high mass to 100,000 Da, optimized from 3000 to 50,000 Da.<br />

3. Set starting laser intensity to 220.<br />

4. Set starting detector sensitivity to 6.<br />

5. Focus lag time at 900 ns.<br />

6. Set data acquisition method to SELDI quantitation.


138 Cazares et al.<br />

7. Set SELDI acquisition parameters 20 delta to 4 transients per to 12 ending position<br />

to 80.<br />

8. Set warming positions with two shots at intensity 230 and do not include warming<br />

shots.<br />

After data collection, each spectrum should be calibrated for mass using the<br />

current peptide calibration. If higher molecular weight data is included for<br />

analysis, the protein standard calibration should be used for the peaks in this<br />

mass range. Spectra should be normalized using total ion current (this is a<br />

feature in the Ciphergen software) with the same normalization coefficient<br />

and low mass cutoff (2000 Da for SPA matrix to exclude matrix peaks). All<br />

spectra should also be processed using the same baseline subtraction protocol.<br />

Perform peak detection using a uniform definition of requisite signal-to-noise<br />

ratio (usually 3) and mass window (usually 0.2–0.3%).<br />

4. Notes<br />

1. Use powder-free nitrile (not latex) gloves when processing SELDI ProteinChips.<br />

Repetitive peaks at 3000–4000 Da will appear in the spectra if samples are<br />

contaminated with latex.<br />

2. Use sample sets of sufficient size. A sample set of at least 30 should be included<br />

in each classification group in order to do multivariate analysis and to give >90%<br />

statistical confidence in a single marker with p values


MALDI/SELDI Protein Profiling of Serum 139<br />

Liotta, L. A. (2002). Use of proteomic patterns in serum to identify ovarian cancer.<br />

Lancet, 359: 572–577.<br />

4. de Noo, M. E., Mertens, B. J., Ozalp, A., Bladergroen, M. R., van der Werff, M. P.,<br />

vandeVelde,C.J.,Deelder,A.M.,andTollenaar,R.A.(2006).Detectionofcolorectal<br />

cancer using MALDI-TOF serum protein profiling. Eur J Cancer, 42: 1068–1076.<br />

5. Sidransky, D., Irizarry, R., Califano, J. A., Li, X., Ren, H., Benoit, N., and Mao, L.<br />

(2003). Serum protein MALDI profiling to distinguish upper aerodigestive tract<br />

cancer patients from control subjects. J Natl Cancer Inst, 95: 1711–1717.<br />

6. Howard, B. A., Wang, M. Z., Campa, M. J., Corro, C., Fitzgerald, M. C., and<br />

Patz, E. F. Jr. (2003). Identification and validation of a potential lung cancer serum<br />

biomarker detected by matrix-assisted laser desorption/ionization-time of flight<br />

spectra analysis. Proteomics, 3: 1720–1724.<br />

7. Baumann, S., Ceglarek, U., Fiedler, G. M., Lembcke, J., Leichtle, A., and Thiery, J.<br />

(2005). Standardized approach to proteome profiling of human serum based on<br />

magnetic bead separation and matrix-assisted laser desorption/ionization time-offlight<br />

mass spectrometry. Clin Chem, 51: 973–980.<br />

8. Orvisky, E., Drake, S. K., Martin, B. M., Abdel-Hamid, M., Ressom, H. W.,<br />

Varghese, R. S., An, Y., Saha, D., Hortin, G. L., Loffredo, C. A., and Goldman, R.<br />

(2006). Enrichment of low molecular weight fraction of serum for MS analysis of<br />

peptides associated with hepatocellular carcinoma. Proteomics, 6: 2895–2902.<br />

9. Feuerstein, I., Rainer, M., Bernardo, K., Stecher, G., Huck, C. W., Kofler, K.,<br />

Pelzer, A., Horninger, W., Klocker, H., Bartsch, G., and Bonn, G. K. (2005).<br />

Derivatized cellulose combined with MALDI-TOF MS: a new tool for serum<br />

protein profiling. J Proteome Res, 4: 2320–2326.<br />

10. Rai, A. J., Gelfand, C. A., Haywood, B. C., Warunek, D. J., Yi, J., Schuchard, M. D.,<br />

Mehigh, R. J., Cockrill, S. L., Scott, G. B., Tammen, H., Schulz-Knappe, P., Speicher,<br />

D. W., Vitzthum, F., Haab, B. B., Siest, G., and Chan, D. W. (2005). HUPO plasma<br />

proteome project specimen collection and handling: towards the standardization of<br />

parameters for plasma proteome samples. Proteomics, 5: 3262–3277.<br />

11. Banks, R. E., Stanley, A. J., Cairns, D. A., Barrett, J. H., Clarke, P., Thompson, D.,<br />

and Selby, P. J. (2005). Influences of blood sample processing on low-molecular<br />

weight proteome identified by surface-enhanced laser desorption/ionization mass<br />

spectrometry. Clin Chem, 51: 1637–1649.<br />

12. Villanueva, J., Philip, J., Entenberg, D., Chaparro, C. A., Tanwar, M. K.,<br />

Holland, E. C., and Tempst, P. (2004). Serum peptide profiling by magnetic<br />

particle-assisted, automated sample processing and MALDI-TOF mass<br />

spectrometry. Anal Chem, 76: 1560–1570.<br />

13. Guerrier, L., Thulasiraman, V., Castagna, A., Fortis, F., Lin, S., Lomas, L.,<br />

Righetti, P. G., and Boschetti, E. (2006). Reducing protein concentration range<br />

of biological samples using solid-phase ligand libraries. J Chromatogr B Analyt<br />

Technol Biomed Life Sci, 833: 33–40.<br />

14. Fountoulakis, M., Juranville, J. F., Jiang, L., Avila, D., Roder, D., Jakob, P.,<br />

Berndt, P., Evers, S., and Langen, H. (2004). Depletion of the high-abundance<br />

plasma proteins. Amino Acids, 27: 249–259.


140 Cazares et al.<br />

15. Lowenthal, M. S., Mehta, A. I., Frogale, K., Bandle, R. W., Araujo, R. P.,<br />

Hood, B. L., Veenstra, T. D., Conrads, T. P., Goldsmith, P., Fishman, D., Petricoin,<br />

E. F. 3rd, and Liotta, L. A. (2005). Analysis of albumin-associated peptides and<br />

proteins from ovarian cancer patients. Clin Chem, 51: 1933–1945.<br />

16. Mehta, A. I., Ross, S., Lowenthal, M. S., Fusaro, V., Fishman, D. A.,<br />

Petricoin, E. F. 3rd, and Liotta, L. A. (2003). Biomarker amplification by serum<br />

carrier protein binding. Dis Markers, 19: 1–10.<br />

17. Semmes, O. J., Feng, Z., Adam, B. L., Banez, L. L., Bigbee, W. L., Campos, D.,<br />

Cazares, L. H., Chan, D. W., Grizzle, W. E., Izbicka, E., Kagan, J., Malik, G.,<br />

McLerran, D., Moul, J. W., Partin, A., Prasanna, P., Rosenzweig, J., Sokoll, L. J.,<br />

Srivastava, S., Srivastava, S., Thompson, I., Welsh, M. J., White, N., Winget, M.,<br />

Yasui, Y., Zhang, Z., and Zhu, L. (2005). Evaluation of serum protein profiling by<br />

surface-enhanced laser desorption/ionization time-of-flight mass spectrometry for<br />

the detection of prostate cancer: I. Assessment of platform reproducibility. Clin<br />

Chem, 51: 102–112.<br />

18. Rai, A. J., Stemmer, P. M., Zhang, Z., Adam, B. L., Morgan, W. T., Caffrey, R. E.,<br />

Podust, V. N., Patel, M., Lim, L. Y., Shipulina, N. V., Chan, D. W., Semmes, O. J.,<br />

and Leung, H. C. (2005). Analysis of human proteome organization plasma<br />

proteome project (HUPO PPP) reference specimens using surface enhanced<br />

laser desorption/ionization-time of flight (SELDI-TOF) mass spectrometry: multiinstitution<br />

correlation of spectra and identification of biomarkers. Proteomics, 5:<br />

3467–3474.<br />

19. Semmes, O. J., Cazares, L. H., Ward, M. D., Qi, L., Moody, M., Maloney, E.,<br />

Morris, J., Trosset, M. W., Hisada, M., Gygi, S., and Jacobson, S. (2005). Discrete<br />

serum protein signatures discriminate between human retrovirus-associated<br />

hematologic and neurologic disease. Leukemia, 19: 1229–1238.<br />

20. Qian, H. G., Shen, J., Ma, H., Ma, H. C., Su, Y. H., Hao, C. Y., Xing, B. C.,<br />

Huang, X. F., and Shou, C. C. (2005). Preliminary study on proteomics of gastric<br />

carcinoma and its clinical significance. World J Gastroenterol, 11: 6249–6253.<br />

21. Ressom, H. W., Varghese, R. S., Abdel-Hamid, M., Eissa, S. A., Saha, D.,<br />

Goldman, L., Petricoin, E. F., Conrads, T. P., Veenstra, T. D., Loffredo, C. A.,<br />

and Goldman, R. (2005). Analysis of mass spectral serum profiles for biomarker<br />

selection. Bioinformatics, 21: 4039–4045.<br />

22. Liu, J., Zheng, S., Yu, J. K., Zhang, J. M., and Chen, Z. (2005). Serum protein<br />

fingerprinting coupled with artificial neural network distinguishes glioma from<br />

healthy population or brain benign tumor. J Zhejiang Univ Sci B, 6: 4–10.<br />

23. Qu, Y., Adam, B. L., Yasui, Y., Ward, M. D., Cazares, L. H., Schellhammer, P. F.,<br />

Feng, Z., Semmes, O. J., and Wright, G. L. Jr. (2002). Boosted decision tree analysis<br />

of surface-enhanced laser desorption/ionization mass spectral serum profiles discriminates<br />

prostate cancer from noncancer patients. Clin Chem, 48: 1835–1843.<br />

24. Papadopoulos, M. C., Abel, P. M., Agranoff, D., Stich, A., Tarelli, E., Bell, B. A.,<br />

Planche, T., Loosemore, A., Saadoun, S., Wilkins, P., and Krishna, S. (2004). A<br />

novel and accurate diagnostic test for human African trypanosomiasis. Lancet, 363:<br />

1358–1363.


8<br />

Urine Sample Preparation and Protein Profiling<br />

by Two-Dimensional Electrophoresis<br />

and Matrix-Assisted Laser Desorption Ionization<br />

Time of Flight Mass Spectroscopy<br />

Panagiotis G. Zerefos and Antonia Vlahou<br />

Summary<br />

Urine represents the most easily attainable and consequently one of the most common<br />

samples in clinical analysis and diagnostics. However, urine is also considered one of<br />

the most difficult proteomic samples to work with due to its highly variable contents,<br />

as well as the presence of various proteins in low abundance or modified forms. In this<br />

chapter, we describe simple protocols and troubleshooting tips for urinary protein preparation<br />

and profiling by two-dimensional electrophoresis or directly via matrix-assisted laser<br />

desorption ionization time of flight mass spectroscopy. Direct dilution, protein precipitation,<br />

ultrafiltration, and solid phase extraction in combination to the above profiling<br />

technologies serve the means for reliable proteomics analysis of one of the most significant<br />

yet very complex biological samples.<br />

Key Words: urine; 2DE; MALDI-TOF-MS; protein profiling; sample preparation.<br />

Abbreviations: ACT: Acetone, CE: Capillary electrophoresis, CHAPS:<br />

[3-[(3-cholamidopropyl)dimethylammonio-1-propanesulfonate], CHCA: -Cyano-4-<br />

hydroxycinnamic acid, d: Dalton, 2DE: Two-dimensional gel electrophoresis, DHB:<br />

Dihydroxybenzoic acid, DTE: 1,4-Dithioerythritol, IEF: Isoelectric focusing, IPG:<br />

Immobilized pH gradient, LC: Liquid chromatography, MALDI: Matrix-assisted laser<br />

desorption ionization, MS: Mass spectrometry, MW: Molecular weight, MWCO:<br />

Molecular weight cut-off, ns: Nano-second, o/n: Overnight, RCF: Relative centrifugal<br />

forces, SA: Sinapinic acid, SDS: Sodium dodecylsulfate, SELDI: Surface-enhanced laser<br />

desorption, SPE: Solid phase extraction, TCA: Trichloroacetic acid, TFA: Trifluoroacetic<br />

acid, TGS: Tris-Glycine-SDS, TOF: Time of flight, UF: Ultrafiltration<br />

From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />

Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />

141


142 Zerefos and Vlahou<br />

1. Introduction<br />

Biological fluids play a central role in clinical chemistry. Investigation<br />

of their cellular (cell number, morphology, etc.) biochemical (metabolites,<br />

biomolecules) and physicochemical (pH, transparency, absorption, etc.)<br />

attributes assists in formulating the clinical judgment on disease prognosis,<br />

diagnosis, and treatment. Urine, according to International Union of Pure and<br />

Applied Chemistry, is the human fluid, which contains water and metabolic<br />

products and is excreted by the kidneys, stored in the bladder and normally<br />

discharged by the way of the urethra. The protein content of urine is very low<br />

under normal conditions (1) and derives mainly from human plasma proteins,<br />

which are not filtered through the renal glomeruli. The presence of proteins at<br />

high concentrations in urine is usually the result of disease or pharmaceutical<br />

treatment. Creatinine assay in urine is one of the most common clinical examinations<br />

and serves this exact purpose, to assess unexpected protein excretion.<br />

It should be noted that besides the soluble proteins, urine also contains proteins<br />

included in exfoliated cells as well as in membrane components known as<br />

exosomes (2). In this chapter, we focus on the description of methods for the<br />

analysis of the soluble urinary proteins and would recommend for the interested<br />

reader the review by Pisitkun et al. (2), for a thorough description of the other<br />

urinary protein components.<br />

In comparison to other proteomics samples, urine is still less explored. The<br />

main reason for this is the fact that urine is a difficult and diverse sample. Its<br />

composition is age, sex, health, and drug dependent. In addition, tremendous<br />

day variations on the protein content exist between first, void, midstream,<br />

morning and random catch urine samples of a single donor. Despite these<br />

facts, protein markers for disease have been detected in urine and have been<br />

approved to be utilized as adjuncts to clinical assays for disease diagnosis<br />

and prognosis (3,4). This justifies and triggers an in-depth analysis of the<br />

urinary proteome, particularly with the advent of contemporary proteomics<br />

technologies, with the objective to identify novel disease diagnostic/prognostic<br />

biomarkers.<br />

Specifically, urine proteome has been studied thoroughly by a series<br />

of proteomics technologies. These include, two-dimensional electrophoresis<br />

(5,6), liquid chromatography (LC) in combination to mass spectroscopy<br />

(MS) (7,8), matrix-assisted laser desorption ionization-time of flight (MALDI-<br />

TOF) or surface-enhanced laser desorption (SELDI)-TOF profiling (9,10,11,<br />

12,13), capillary electrophoresis coupled to MS (14,15) and combinations<br />

thereof, implementing several separation steps both chromatographic and<br />

electrophoretic (15,16,17,18,19). The great interest in the investigation of the<br />

urinary proteome is reflected by the recent establishment of the human urine


Urine Sample Preparation 143<br />

and kidney proteome initiative (http://hkupp.kir.jp) within the Human Proteome<br />

Organization that targets the integration of existing research efforts in this field.<br />

In this chapter, we provide detailed protocols and troubleshooting tips<br />

as experienced by the authors, in the preparation and analysis of urinary<br />

proteins by two-dimensional gel electrophoresis (2DE) or directly by MALDI-<br />

TOF-MS. We selected these two profiling approaches since the former<br />

is a classical high resolution profiling approach (see also Chapters 4–<br />

6), whereas the latter offers the advantage of high throughput [see also<br />

Chapters 7 and 13]. In general, the process of urine analysis for the investigation<br />

of its protein content can be divided into three main steps: sample<br />

collection, usually performed at the physician’s office, protein extraction,<br />

protein separation, and detection. Each of these steps is very crucial and<br />

affects significantly the output of the proteomics experiment. In this chapter,<br />

an emphasis is given on the description of the various protein preparation/extraction<br />

methodologies including: ultrafiltration, precipitation, and<br />

solid phase extraction (SPE) as they complement 2DE and MALDI-TOF-MS<br />

profiling. Apparently, additional protein preparation methods exist such as<br />

dialysis, ultracentrifugation, etc. (see Note 1); however, we have focused on<br />

the three aforementioned methods due to their simplicity, increased reproducibility,<br />

and overall compatibility with the 2DE and MALDI MS profiling<br />

approaches.<br />

1.1. Protein Precipitation<br />

Protein precipitation is a very common purification procedure employed<br />

for the isolation of macromolecules. The denaturation and precipitation of<br />

proteins occurs in solutions of extreme ionic strength, very low pH, or high<br />

concentrations of organic solvents. In such conditions, biopolymers do not<br />

retain a conformation capable of sustaining their solubility. Commonly used<br />

reagents are ammonium sulfate ([NH 4 ] 2 SO 4 ), used for protein desalting at<br />

concentrations of 3 M, trichloroacetic acid (TCA) [used at concentrations higher<br />

than 5% (w/v)], and several organic solvents [ethanol, acetone, acetonitrile,<br />

chloroform, methanol, and isopropanol, at final concentrations higher than<br />

50%, (v/v)]. The choice of the precipitation methodology depends primarily on<br />

the analytical procedure employed. In general, protein desalting is avoided in<br />

proteomics sample preparations since residual salts inhibit further analysis by<br />

2DE and mass spectrometry. TCA precipitation followed by acetone washes is<br />

very popular and efficient, especially in cases of very dilute protein solutions.<br />

Organic solvents offer very high yields but some of them are toxic (methanol,<br />

acetonitrile) while others like chloroform (also toxic) employ rather complicated


144 Zerefos and Vlahou<br />

precipitation procedures. A detailed description of these approaches for urinary<br />

protein preparation is provided in Section 3.<br />

1.2. Ultrafiltration-SPE<br />

Ultrafiltration is a technique based on the use of molecular filters in combination<br />

to centrifugal forces. The whole procedure is performed in a centrifuge<br />

and in temperatures varying from 4°C to ambient conditions. It presents many<br />

advantages; for example, proteins are kept in solution and are more easily<br />

handled. A major disadvantage is the cost of the approach and the fact that<br />

even traces of the filter materials, when eluted, produce significant problems<br />

in MS based methodologies.<br />

Solid phase extraction in combination to MS for urine clinical proteomics<br />

is a newly added approach (22). SPE in the form of magnetic particles was<br />

recently developed as the front end of direct profiling of biological fluids by<br />

MS (23).<br />

We have found that acetone or TCA precipitation and ultrafiltration are very<br />

efficient urinary protein preparation approaches, highly compatible with 2DE<br />

analysis (Figs. 1, 2). In the case of MALDI MS profiling, we favor the utilization<br />

of ultrafiltration, SPE as well as direct dilution of urine in MS compatible<br />

buffers as front end protein preparation methods (see Note 2, Fig. 3). The<br />

detailed protocols are provided below.<br />

1 2 3 4 5 6 7 8<br />

Fig. 1. Comparison of urinary sample preparation approaches. Lanes correspond to:<br />

(1) marker, (2) urine starting material, (3) TCA/acetone precipitation supernatant, (4)<br />

TCA precipitate, (5) urine supernatant after 3 h centrifugation at 200,000 RCF, (6)<br />

protein pellet after ultracentrifugation of 5 mL urine, (7) urine filtrate after ultrafiltration<br />

through 5 kd MWCO, and (8) urine retentate after ultrafiltration. In lanes 2, 3, 5, and<br />

7 equal volumes of urine sample were utilized; similarly, lanes 4 and 8 correspond<br />

to same amount of starting urine material in order to facilitate comparison of the<br />

approaches.


Urine Sample Preparation 145<br />

2. Materials<br />

2.1. Sample Collection, Handling, and Storage<br />

1. Polypropylene aliquoting tubes (1.5, 2, 15, and 50 mL), Sarstedt Corporation<br />

(Nümbrecht, Germany)<br />

2.2. Urine Sample Preparation/Protein Precipitation<br />

2.2.1. TCA/Acetone Precipitation Protocol<br />

1. Trichloroacetic acid, ultra pure (store solutions at 2–8°C), Sigma Corporation<br />

(St. Luis, MO, USA)<br />

2. Acetone, analytical purity grade, Sigma Corporation<br />

2.2.2. Organic Solvent Precipitation Protocol<br />

1. Acetone, analytical purity grade, Sigma Corporation<br />

2. Isopropanol, analytical purity grade, Sigma Corporation<br />

3. Ethanol, analytical purity grade, Sigma Corporation<br />

2.2.3. Urine Ultrafiltration<br />

1. Amicon ultrafiltration devices, Millipore Corporation (Billerica, MA, USA)<br />

2.2.4. Urine SPE<br />

1. Bioselect C18 SPE cartridges were from Grace Vydac (Columbia, MS, USA)<br />

2. Methanol, high performance liquid chromatography HPLC grade, Sigma<br />

Corporation<br />

3. Acetonitrile, HPLC grade, Sigma Corporation<br />

4. Trifluoroacetic acid, HPLC grade, Sigma Corporation<br />

2.3. Analytical/Profiling Techniques<br />

2.3.1. Two-Dimensional Separation<br />

1. Protean isoelectric focusing (IEF) cell, Biorad (Hercules, CA, USA)<br />

2. Nonlinear immobilized pH gradient (IPG) strips (3,4,5,6,7,8,9,10), 17 cm long<br />

3. 2DE sample buffer: 7 M urea, 2 M thiourea, 4% CHAPS w/v, 0.4% 1,4-<br />

dithioerythritol (DTE) w/v, 2% IPG buffer (Biorad) w/v, all components are of<br />

molecular biology grade<br />

4. Mineral oil<br />

5. Equilibration buffer I: 6 M urea, 50 mM Tris–HCl, pH 8.8, 30% glycerol, 2.0%<br />

sodium dodecylsulfate (SDS), 30 mM DTE<br />

6. Equilibration buffer II: 6 M urea, 50 mM Tris–HCl, pH 8.8, 30% glycerol (v/v),<br />

2.0% SDS (w/v), 230 mM iodocatemide. All components are of molecular biology<br />

grade


146 Zerefos and Vlahou<br />

7. Fixation solution: 5% phosphoric acid (p.a grade, Sigma) w/v, 50% methanol v/v<br />

(HPLC grade, Sigma)<br />

8. Colloidal coomassie brilliant blue staining kit, Invitrogen (Carlsbad, CA, USA)<br />

9. GS-800 calibrated densitometer and PDQuest software, Biorad<br />

2.3.2. MALDI-TOF-MS<br />

1. Matrix solution: 50% acetonitrile v/v, 0.1% trifluoroacetic acid (TFA) v/v, 0.75%<br />

[-cyano-4-hydroxy-cinnamic (CHCA), Sigma Corporation]. Caution: all MALDI<br />

matrices are light sensitive; avoid unnecessary light exposure. Fresh preparation<br />

is advised, or else keep for 1 week (maximum) and store at 4°C<br />

2. MALDI ground steel target plate<br />

3. Ultraflex I MALDI-TOF-TOF-MS (Bruker Daltonics, Bremen, Germany)<br />

4. FlexAnalysis 2.2 software, Bruker Daltonics<br />

2.4. Miscellaneous<br />

The HPLC grade water (Resistivity >18 M cm −1 , Total organic carbon<br />

(TOC)


Urine Sample Preparation 147<br />

A1<br />

A2<br />

B1<br />

A3<br />

A4<br />

B2<br />

Fig. 2. Two-dimensional profiling of (A) 24 h collected urine concentrated by<br />

(1) ultrafiltration through 5000 MWCO, (2) TCA precipitation, (3) acetone precipitation<br />

without washing of the protein pellet, and (4) acetone precipitation with pellet washing.<br />

In these cases (1,2,3,4), the starting material was preconcentrated via membrane<br />

filtration (Pellicon 2 system, Millipore, Corporation); ultrafiltration and TCA or acetone<br />

precipitation, as applicable, were applied for the further concentration of the sample<br />

prior to 2DE analysis. (B) Two-dimensional profiling of random catch urine (50 mL<br />

starting volume without any preconcentration) condensed via (1) ultrafiltration through<br />

5000 MWCO and (2) acetone precipitation. In all cases, 1 mg of protein was analyzed<br />

and visualized with colloidal coomassie stain in 3–10 nonlinear IPG strips.<br />

6. Let pellet dry at ambient temperature (see Note 11).<br />

7. Solubilize pellet in 2DE sample buffer and proceed with 2DE analysis (see<br />

Subheading 3.3.1, Note 12, and Fig. 2).<br />

8. The protein pellet may also be subjected to solubilization with MS compatible<br />

buffers and analyzed by MS profiling (see Note 13, Subheading 3.3.2, and<br />

Fig. 3).<br />

3.2.2. Organic Solvent Precipitation Protocol<br />

1. Add to the urine sample at least equal volume of the desired organic solvent<br />

(ethanol, acetone, or isopropanol) and mix (see Notes 14, 15).<br />

2. Keep at –20°C o/n (see Note 16).


148 Zerefos and Vlahou<br />

G<br />

Intensity<br />

×10 4 2<br />

6<br />

4<br />

E<br />

F<br />

D<br />

C<br />

B<br />

1000<br />

5000<br />

Mass to charge<br />

A<br />

10,000<br />

Fig. 3. MALDI-TOF-MS profiling of urine. (A) Ultrafiltration retentate through<br />

5000 MWCO, diluted 10× in 0.1% TFA; (B) 10× dilution of urine in 0.1% TFA;<br />

(C) supernatant of urine (diluted in 0.1% TFA) after protein precipitation via acetone;<br />

(D) urine protein pellet from acetone precipitation reconstituted in 0.1% TFA; (E) urine<br />

protein pellet from acetone precipitation reconstituted in 50% acetonitrile 0.1% TFA;<br />

(F) acetone precipitation (supernatant) and further purification of the supernatant by<br />

C18-SPE followed by dilution in 0.1% TFA; (G) C18-SPE eluate in 50% acetonitrile,<br />

0.1% TFA. Extensive reproducibility studies indicated that urine processing by ultrafiltration<br />

or direct dilution in 0.1% TFA provides with the most robust spectra of the<br />

methods tested. Adapted from (13).<br />

3. Centrifuge at standard refrigerated bench-top centrifuges (for eppendorf type<br />

tubes) for 15 min at RCF of 16,000–17,000 and 4°C. Discard the supernatant.<br />

4. Wash pellet with ice-cold acetone, leave for 5–10 min at –20°C, and centrifuge<br />

again. Discard supernatant and repeat once more the washing step (see Note 17).<br />

5. Let pellet dry at ambient temperature.<br />

6. Solubilize pellet and proceed with 2DE analysis. The protein pellet or supernatant<br />

may also be subjected to solubilization with MS compatible buffers and analyzed<br />

by MS profiling (see Notes 12, 13, Subheading 3.3.1, Figs. 2, 3).<br />

3.2.3. Urine Ultrafiltration<br />

1. Place one volume of urine upon a 5000 kd molecular weight cut-offs (MWCO)<br />

Amicon ultrafiltration device (see Notes 18–20).


Urine Sample Preparation 149<br />

2. Spin in a refrigerated centrifuge at 3500 RCF and 8–12°C (see Notes 21, 22).<br />

3. After condensation, collect the retentate and discard or keep the filtrate depending<br />

on the specific application (see Notes 23–25).<br />

4. For 2DE add the appropriate volume of sample buffer to the retentate and proceed<br />

with IEF (see Notes 26–27, Subheading 3.3.1, and Fig. 2).<br />

5. For MALDI profiling dilute the retentate 10 times with 0.1% TFA v/v, and<br />

proceed as described below (see Subheading 3.3.2, Fig. 3).<br />

3.2.4. Urine SPE (see Note 28)<br />

1. Activate cartridge with a total of 1 mL methanol (two applications of 500 μL each).<br />

2. Wash cartridge with 2 mL acetonitrile (four applications of 500 μL each, see<br />

Note 29).<br />

3. Equilibrate cartridge with a total of 1 mL 0.1% TFA v/v (two applications of<br />

500 μL each).<br />

4. Load cartridge with 1 mL urine acidified by TFA at 0.1% (v/v) final concentration.<br />

5. Wash cartridge with 1 mL 0.1% TFA v/v (two applications of 500 μL each).<br />

6. Elute compounds by adding 100 μL of 50% acetonitrile, 0.1% TFA v/v.<br />

7. Take 1 μL eluent, place on MALDI target, and process for MALDI MS profiling<br />

(see Subheading 3.3.2, Fig. 3).<br />

3.2.5. Direct Dilution of Urine<br />

This method is used only in conjunction to direct MALDI MS profiling<br />

• Dilute urine 10 times with 0.1% TFA v/v (see Notes 30, 31).<br />

• Apply 1 μL of the urine sample on MALDI target.<br />

• Apply 1 μL matrix solution.<br />

• Proceed with MALDI-TOF-MS (see Subheading 3.3.2, Fig. 3).<br />

3.3. Analytical/Profiling Techniques<br />

3.3.1. Two-dimensional Separation<br />

1. Measure protein concentration of the sample (pretreated by precipitation or<br />

ultrafiltration) by the use of a commercially available protein kit.<br />

2. Take 0.5–1 mg of urinary proteins diluted in 300 μL of 2DE sample buffer (see<br />

Note 32).<br />

3. Distribute the sample volume equally in a lane of the IEF focusing tray.<br />

4. Place the strip carefully, with the gel face down and in contact with the electrodes<br />

(see Note 33).<br />

5. Rehydrate actively for 16 h at 50 V and 20°C. Caution: do not cover the strip<br />

with mineral oil immediately but after 1hofrehydration (see Note 34).<br />

6. After rehydration, place moistened IEF papers between the strip and electrodes.<br />

7. Start IEF. The typical program is: 250 V for 30 min, linear increment up to<br />

5000 V in 12 h, 5000 V for 16 h (total 110,000 V-h) (see Note 35).


150 Zerefos and Vlahou<br />

8. After IEF is complete, equilibrate strip with 10 mL equilibration buffer I for<br />

20 min at ambient temperature.<br />

9. Alkylate with 10 mL equilibration buffer II for 20 min (see Note 36).<br />

10. Place strip on top of 12.5% polyacrylamide gel, cover with 0.5% melted agarose<br />

in TGS buffer and start second dimension. Start with 10 mA current for 1hand<br />

continue with 40 mA for approximately another 4h(see Note 37).<br />

11. Fix gel for 2 h with fixation solution.<br />

12. Stain o/n with colloidal coomassie blue stain (Fig. 2).<br />

3.3.2. MALDI-TOF-MS<br />

1. Place 1 μL sample on the MALDI target plus 1 μL matrix solution and mix on<br />

spot (dried droplet technique, see Notes 38 and 39).<br />

2. Leave target to dry at ambient temperature in the dark.<br />

3. Load sample in the instrument and execute the appropriate MS method. Run the<br />

instrument in linear mode (see Note 40).<br />

4. Optimize ion acceleration; tempering with sensitivity of the detector is not recommended<br />

prior to MS method establishment (see Note 41).<br />

5. Set pulsed ion extraction (delayed ion acceleration) according to the profiling<br />

region in use. Typically when -cyano-cinnamic acid is utilized 50–150 ns are<br />

applied for large peptides (3–5 kd), 150–300 ns for small molecular weight<br />

proteins (15 kd), and higher than 300 for proteins (>20 kd, see Notes 42 and 43).<br />

6. Collect 1000–2000 shots per sample and sum the collected data (see Note 44).<br />

4. Notes<br />

1. Dialysis is one of the most classical methods for buffer exchange and purification<br />

(separation) of high from low molecular weight constituents of a specific<br />

sample. Although it has been utilized elsewhere (20) we consider it rather<br />

laborious, costly and serving solely purification and not condensation purposes.<br />

Ultracentrifugation has been applied (21) for the isolation of higher molecular<br />

weight urinary proteins prior to 2DE (Fig. 1). In our opinion, centrifugal<br />

isolation of proteins is a very diverse and complicated issue and reproducibility is<br />

consequently compromized. Precipitation of biopolymers by ultracentrifugation<br />

requires the use of solutions with very well calculated composition in order to<br />

extract the velocity for protein isolation from the theoretical Svedberg values.<br />

Urine samples differ significantly in density (d = m/v) and pH values to serve<br />

such purposes in a well-defined and reproducible manner.<br />

2. It should be emphasized that extensive complementarity of the various methods<br />

exists; thereby the combinatorial application of different methods is recommended<br />

in order to increase protein resolution.<br />

3. Urine samples can be first void, midstream, morning, random catch, or 24 h.<br />

Due to its high bacterial content, first morning urine is usually not recommended<br />

in biomarker discovery studies.


Urine Sample Preparation 151<br />

4. Upon their collection, if not stored immediately in –80°C, urine samples should<br />

be stored at 4°C. Published data support (9,10) that for analysis by 2DE or<br />

SELDI/MALDI MS the generated proteomic profiles are usually stable for up to<br />

24 h urine storage at 4°C prior to deep freezing. We have observed occasional<br />

profile changes after so prolonged storage times at 4°C, and we therefore favor<br />

shorter times.<br />

5. An enrichment of the soluble supernatant for cellular proteins may be achieved<br />

if prior to the centrifugation step a mild sonication (sonicator bath) for 5–10 min<br />

is applied.<br />

6. The volume of urine required depends on the specific downstream application.<br />

For 2DE analysis an aliquot of at least 15 mL of urine is required. For direct<br />

MALDI MS profiling 1 mL urine aliquot is sufficient.<br />

7. The TCA can be added as solid to a final concentration of 15% (w/v) (TCA<br />

is extremely hydroscopic and is easily solubilized). Alternatively, the appropriate<br />

volume of 100% TCA w/v may be added to the urine sample to reach<br />

a final concentration of 15% (w/v). TCA precipitation can also be performed<br />

at –20°C and o/n storage with occasionally slightly better efficiency. Caution:<br />

TCA solutions may form bilayer aqueous–organic systems depending on the<br />

salt concentration of the urine at –20°C or lower temperatures. The precipitation<br />

efficiency is dependent of the protein concentration of a given sample;<br />

in our experience, for example, the precipitation yield for a starting material<br />

of 0.5 mg/mL protein concentration (i.e., 1 mg total protein found in 2 mL<br />

sample) ranges from 40 to 70%; in contrast the precipitation efficiency for a<br />

starting material of 0.1 mg/mL protein concentration (i.e., 1 mg protein in 10 mL<br />

sample) is 0–30%. For this reason, avoid adding TCA solution in very dilute<br />

protein samples.<br />

8. In case where the highest available centrifugal force is only 4000–5000 RCF,<br />

then longer centrifugation times (45 min) are recommended.<br />

9. The volume of acetone utilized for washing depends on the size of the protein<br />

pellet. A general rule is to use 1 mL acetone for every 1 mL of urine starting<br />

material.<br />

10. Acetone washes are needed to drive of excess TCA or else the pellet is extremely<br />

acidic and buffers utilized in further steps are neutralized. In addition, TCA<br />

(nonvolatile acid) may inhibit IEF, PAGE, LC, or MS analysis. We have found<br />

that acetone washes of the pellet does not induce significant protein losses.<br />

11. The pellet should not be completely dried off, since this renders difficult<br />

its subsequent solubilization in 2DE or other buffers. Acetone evaporation at<br />

elevated temperatures is not recommended for the same reason.<br />

12. If the pellet does not come in solution, try mild sonication (5 min in a sonicator<br />

bath) or incubate at ambient temperature for 30 min with intermittent vortexing.<br />

However, heating should be avoided (particularly if the pellet is resuspended in<br />

2DE buffer since urea decomposes when heated and reacts with amino acids).<br />

The buffer volume required for solubilization depends on the protein content<br />

(pellet size) and the type of downstream application (2DE or MALDI-TOF-MS).


152 Zerefos and Vlahou<br />

13. The protein pellet may be solubilized in 0.1% TFA v/v (roughly 100 μL of<br />

solubilization buffer for every milliliter of urine starting material) and analyzed<br />

by MALDI-TOF-MS. However, in our experience, plasticizers possibly extracted<br />

during the precipitation process are frequently detected and reproducibility<br />

problems are observed. Therefore, unless additional purification steps are introduced<br />

(SPE, etc.), we do not favor the application of precipitation methods at<br />

the front end of MALDI MS profiling.<br />

14. The use of ethanol, acetone, or isopropanol is favored. These are hydrophobic,<br />

water mixable – even at elevated salt concentrations – nontoxic, and volatile.<br />

In particular, we favor the use of acetone since it is cheap, extremely volatile,<br />

and rarely forms aqueous–organic bilayers. Organic solvent mixtures e.g.,<br />

isopropanol–acetone, do not increase precipitation efficiencies; in our experience<br />

their use induces reproducibility problems and therefore is not recommended.<br />

15. The sample to solvent ratio depends on the downstream application and the<br />

sample protein concentration. For dilute urine samples (protein concentration of<br />

micrograms per milliliter) a solvent to sample ratio of 3 provides relatively high<br />

precipitation efficiencies. We have observed that for more concentrated samples<br />

(for example, preconcentrated urine or in general starting material of protein<br />

content in the micrograms per milliliter range), the precipitation efficiency for<br />

lower MW constituents reaches its maximum at solvent to sample ratio of<br />

about 9.<br />

16. Precipitation is most efficient at –20°C (lower efficiencies have been observed at<br />

4°C, whereas at –80°C bilayer systems may form, which inhibit the procedure).<br />

17. Acetone washes of pellet in organic solvent precipitation protocols are not<br />

accustomed. From our experience, however, washing offers great advantages<br />

especially when 2DE separation is the downstream application since salts and<br />

other interfering substances are removed (Fig. 2). This washing step renders<br />

2DE gels produced after acetone precipitation equally good to those generated<br />

following TCA precipitation. Acetone washing induces negligible protein losses.<br />

18. There are Amicon UF devices that can accommodate up to 4 (UF4) or 15 mL<br />

(UF15) sample volumes. We regularly utilize the UF4 devices when MALDI MS<br />

profiling is to be performed and UF15 when 2DE is the downstream application.<br />

19. Amicon devices have several MWCO. We propose the use of 5000 kd MWCO<br />

for the isolation and condensation of “total” urine protein content. The use of<br />

different MWCO is advised for specific isolation of molecular weight groups<br />

(see also Note 25). It should be emphasized that UF is not an absolute sizeexclusion<br />

separation method and cross-contamination between different protein<br />

size groups is expected and regularly observed.<br />

20. UF can be performed in the presence of chemical additives. The kind of additives<br />

in use depends on the downstream application (2DE, MALDI profiling, LC-<br />

MS, etc.) since in all cases the chemical compatibility to the latter should be<br />

maintained. For example, we have observed that in case of direct MALDI MS<br />

profiling most additives (detergents such as: octyl-glucopyranoside, triton-100,<br />

tween-20, and organic solvents such as: trifluoroethanol


Urine Sample Preparation 153<br />

and isopropanol


154 Zerefos and Vlahou<br />

(e.g., phosphor or glycopeptides) is feasible and that is which differentiates SPE<br />

from other sample preparation steps. From our point of view SPE in combination<br />

to direct MS profiling is encouraged.<br />

29. All chromatographic and SPE media contain residuals and plasticizers, which<br />

should be driven off prior to analyte binding. Failure to perform this step may<br />

result in complete ionization suppression during MALDI profiling.<br />

30. The user may have to try different dilutions of the urine sample. In MALDI MS<br />

profiling experiments, there is a range of protein concentration within which the<br />

spectra quality is not affected. It is advised to conduct preliminary experiments<br />

in order to address this issue.<br />

31. In addition to TFA, the use of several additives (urea, octyl-glucopyranoside,<br />

triton-100, tween-20, NP-40, cholate, and organic solvents) at MALDI MS<br />

compatible concentrations has been tested on urinary peptide–protein ionization.<br />

However, we did not observe any clear advantage on protein resolution or<br />

ionization in these cases.<br />

32. The recommended protein amount of 0.5–1 mg is suitable for 17–18 cm length<br />

and 3–10 or 4–7 pH range strips. The protein amount will vary if different strip<br />

types are utilized, according to the manufacturer’s guidelines (for additional tips<br />

on 2DE see Chapters 4–6).<br />

33. Noncup loading was found to provide better resolution in urine analysis by 2DE<br />

compared to the cup loading method.<br />

34. Direct addition of the mineral oil might cause extraction of hydrophobic proteins<br />

to the oil layer.<br />

35. These running conditions are for the analysis of 1 mg protein sample on wide<br />

range (3–10 or 4–7) 17 or 18 cm IPG strips. The program will vary depending<br />

on the sample quantity and the type of strip in use.<br />

36. Reduction and alkylation are necessary for higher protein resolution in SDS-<br />

PAGE and also for protein identification through peptide mass fingerprinting.<br />

37. The low starting current is needed for the slow migration of the proteins from<br />

the strip to the polyacrylamide gel. Direct electrophoresis with 40 mA current<br />

may cause protein losses. Alternatively, the gel may run at 10 mA o/n. Although<br />

slower, the latter approach provides gels of higher resolution, in our experience<br />

(for additional tips on 2DE see Chapters 4–6).<br />

38. Several sample application techniques were tested (thin layer preparation, double<br />

layer, and variations of dried droplet). Of those, we found that dried droplet<br />

(with simultaneous sample and matrix application) was the simplest, fastest,<br />

and most reliable method. In addition, the simultaneous drying of sample and<br />

matrix solution (rather than sample and matrix separately) increases reproducibility<br />

and minimizes losses during subsequent spot washes. In contrast,<br />

if sample and matrix are mixed prior to their application on the target, their<br />

consumption is much higher and the sample exposure to plastics increases,<br />

thereby increasing the chances for sample contamination and subsequent ion<br />

suppression by plasticizers.


Urine Sample Preparation 155<br />

39. In case that crystal formation is obscured due to high salt content in the<br />

sample, wash the spot by pipetting two to three times with 2 μL of cool 0.1%<br />

TFA solution v/v (let dry again, do not wipe dry). Always prefer spot to spot<br />

washing rather than washing the entire target, in order to avoid sample crosscontamination.<br />

40. Instrument calibration is performed according to the manufacturer specifications.<br />

In any case, we propose daily calibration to ensure precision and accuracy.<br />

41. Acceleration of biomolecules is first of all affected by voltage settings of the<br />

ion source. Settings of the analyzer (TOF) affect mainly resolution parameters,<br />

while detector settings should be tempered only to improve signal to noise<br />

characteristics of a given sample.<br />

42. The mass spectrum should be divided into subregions and data of each of the<br />

latter should be collected separately, in order to increase protein resolution.<br />

This is because ionization kinetics (and consequently instrument settings) are<br />

completely different for different protein sizes.<br />

43. Different matrices (e.g., CHCA or dihydroxybenzoic acid for peptides and SA<br />

for proteins) require different laser focusing settings. In general, large crystals<br />

(such as the ones formed by SA) and larger protein molecules require more<br />

concentrated energy bursts than smaller ones where more disperse hits may be<br />

used.<br />

44. Always sum the same amount of laser shots and select as many regions of a<br />

spot as possible to ensure high reproducibility.<br />

Acknowledgments<br />

This study was supported by the Greek Ministry of Health.<br />

References<br />

1. Norden, G.W.A., Sharratt, P., Cutillas, P.R., Cramer, R., Gardner, S.C. and<br />

Unwin, R.J. (2004) Quantitative amino acid and proteomics analysis: Very low<br />

excretion of polypeptides >750 Da in normal urine. Kidney International 66,<br />

1994–2003.<br />

2. Pisitkun, T., Johnstone, R. and Knepper, M.A. (2006) Discovery of urinary<br />

biomarkers. Molecular and Cellular Proteomics 5, 1760–1771.<br />

3. Nielsen, M.E., Schaeffer, E.M., Veltri, R.W., Schoenberg, M.P., Getzenberg, R.H.<br />

(2006) Urinary markers in the detection of bladder cancer: What’s new Current<br />

Opinion in Urology 16, 350–355.<br />

4. Thongboonkerd, V. and Malasit, P. (2005) Renal and urinary proteomics: Current<br />

applications and challenges. Proteomics 5, 1033–1042.<br />

5. Pieper, R., Gatlin, C.L., McGrath, A.M., Makusky, A.J., Mondal, M.,<br />

Seonarain, M., Field E., Schatz, C.R., Estock, M.A., Ahmed, N., Anderson, N.G.<br />

and Steiner, S. (2004) Characterization of the human urinary proteome: A method


156 Zerefos and Vlahou<br />

for high-resolution display of urinary proteins on two-dimensional electrophoresis<br />

gels with a yield of nearly 1400 distinct protein spots. Proteomics 4, 1159–1174.<br />

6. Oh, J., Pyo, J., Jo, E., Hwang, S., Kang, S., Jung, J., Park, E., Kim, S., Choi, J.<br />

and Lim, J. (2004) Establishment of a near-standard two-dimensional human urine<br />

proteomic map. Proteomics 4, 3485–3497.<br />

7. Spahr, C.S., Davis, M.T., McGinley, M.D., Robinson, J.H., Bures, E.J., Beierle, J.,<br />

Mort, J., Courchesne, P.L., Chen, K., Wahl, R.C., Yu, W., Luethy, R. and<br />

Patterson, S.D. (2001) Towards defining the urinary proteome using liquid<br />

chromatography-tandem mass spectrometry I. Profiling an unfractionated tryptic<br />

digest. Proteomics 1, 93–107.<br />

8. Cutillas, P.R., Norden, A., Cramer, R., Burlingame, A. and Unwin, R.J. (2003)<br />

Detection and analysis of urinary peptides by on-line liquid chromatography and<br />

mass spectrometry: Application to patients with renal Fanconi syndrome. Clinical<br />

Science 104, 483–490.<br />

9. Schaub, S., Wilkins J., Weiler, T., Sangster, K., Rush, D., Nickerson, P.<br />

(2004) Urine protein profiling with SELDI TOF MS. Kidney International 65,<br />

323–332.<br />

10. Rogers, M.A., Clarke, P., Noble, J., Munro, N.P., Paul, A., Selby, P.J. and<br />

Banks, R.E. (2003) Proteomic profiling of urinary proteins in renal cancer by<br />

surface enhanced laser desorption ionization and neural-network analysis: Identification<br />

of key issues affecting potential clinical utility. Cancer Research 63,<br />

6971–6983.<br />

11. Vlahou, A., Schellhammer, P.F., Mendrinos, S., Patel, K., Kondylis, F.I., Gong, L.,<br />

Nasim, S. and Wright, J.G. Jr. (2001) Development of a novel proteomic approach<br />

for the detection of transitional cell carcinoma of the bladder in urine. The American<br />

Journal of Pathology 158, 1491–1502.<br />

12. Vlahou, A., Giannopoulos, A., Gregory, B.W., Manousakas, T., Kondylis, F.I.,<br />

Wilson, L.L., Schellhammer, P.F., Semmes, O.J. and Wright G.L. Jr. (2004) Protein<br />

profiling in urine for the diagnosis of bladder cancer. Clinical Chemistry 50,<br />

1438–1445.<br />

13. Zerefos, P.G., Prados, J., Kalousis, A. and Vlahou, A. (2007) Sample preparation<br />

and bioinformatics in MALDI profiling of urinary proteins. Journal of Chromatography<br />

B. Analyt Technol Biomed Life Sci. 15, 20–30.<br />

14. Zórbig, P., Renfrow, M.B., Schiffer, E., Novak, J., Walden, M., Wittke, S., Just, I.,<br />

Pelzing, M., NeusóÌ, C., Theodorescu, D., Root, K.E., Ross, M.M. and Mischak, H.<br />

(2006) Biomarker discovery by CE-MS enables sequence analysis via MS/MS with<br />

platform-independent separation. Electrophoresis 27, 2111–2125.<br />

15. Mischal, H., Kaiser, T., Walden, M., Hillmann, M., Wittke, S., Herrmann, A.,<br />

Knueppel, S., Haller, H. and Fliser, D. (2004) Proteomic analysis for the assessment<br />

of diabetic renal damage in humans. Clinical Science 107, 485–495.<br />

16. Zerefos, P.G., Vougas, K., Dimitraki, P., Kossida, S., Petrolekas, A.,<br />

Stravodimos, K., Giannopoulos, A., Fountoulakis, M. and Vlahou, A. (2006)<br />

Characterization of the human urine proteome by preparative electrophoresis in<br />

combination with 2-DE. Proteomics 6, 4346–4355.


Urine Sample Preparation 157<br />

17. Pang, J.X., Ginanni, N., Dongre, A.R., Hefta, S.A., and Opiteck, G.J. (2002)<br />

Biomarker discovery in urine by proteomics. Journal of Proteome Research 1,<br />

161–169.<br />

18. Sun, W., Li, F., Wu, S., Wang, X., Zheng, D., Wang, J. and Gao, Y. (2005) Human<br />

urine proteome analysis by three separation approaches. Proteomics 5, 4994–5001.<br />

19. Soldi, M., Sarto, C., Valsecchi, C., Magni, F., Proserpio, V., Ticozzi, D. and<br />

Mocarelli, P. (2005) Proteome profile of human urine with two-dimensional liquid<br />

phase fractionation. Proteomics 5, 2641–2647.<br />

20. Rasmussen, H.H., Orntoft, T.F., Wolf, H. and Celis, J.E. (1996) Towards a comprehensive<br />

database of proteins from the urine of patients with bladder cancer. The<br />

Journal of Urology 6, 2113–2119.<br />

21. Thongboonkerd, V., McLeish, K.R., Arthur, J.M. and Klein, J.B. (2002) Proteomic<br />

analysis of normal human urinary proteins isolated by acetone precipitation or<br />

ultracentrifugation. Kidney International 62, 1461–1469.<br />

22. Glen, L., Hortin, G.L., Meilinger, B. and Drake, S.K. (2004) Size-selective<br />

extraction of peptides from urine for mass spectrometric analysis. Clinical<br />

Chemistry 50, 1092–1095.<br />

23. Zhang, X., Leung, S., Morris, C.R. and Shigenaga, M.K. (2004) Evaluation<br />

of a novel, integrated approach using functionalized magnetic beads, benchtop<br />

MALDI-TOF-MS with prestructured sample supports, and pattern recognition<br />

software for profiling potential biomarkers in human plasma. Journal of<br />

Biomolecular Techniques 15, 167–175.


9<br />

Combining Laser Capture Microdissection<br />

and Proteomics Techniques<br />

Dana Mustafa, Johan M. Kros, and Theo Luider<br />

Summary<br />

Laser microdissection is an effective technique to harvest pure cell populations from<br />

complex tissue sections. In addition to using the microdissected cells in several DNA and<br />

RNA studies, it has been shown that the small number of cells obtained by this technique<br />

can also be used for proteomics analysis. Combining laser capture microdissection and<br />

different types of mass spectrometers opened ways to find and identify proteins that are<br />

specific for various cell types, tissues, and their morbid alterations. Although the combination<br />

of microdissection followed by the currently available techniques of proteomics has<br />

not yet reached the stage of genome wide representation of all proteins present in a tissue,<br />

it is a feasible way to find significant differentially expressed proteins in target tissues.<br />

Recent developments in mass spectrometric detection followed by proper statistics and<br />

bioinformatics enable to analyze the proteome of not more than 100–200 cells. Obviously,<br />

validation of result is essential. The present review describes and discusses the various<br />

methods developed to target cell populations of interest by laser microdissection, followed<br />

by analysis of their proteome.<br />

Key Words: laser capture microdissection; matrix-assisted laser desorption/<br />

ionization; Fourier transformer mass spectrometry; time-of-flight mass spectrometry; liquid<br />

chromatography-electrospray ionization tandem mass spectrometry; two-dimensional<br />

polyacrylamide gel electrophoresis; differential in-gel electrophoresis; protein chip<br />

technology.<br />

Abbreviations: LCM: Laser Capture Microdissection, LMM: Laser Microbeam<br />

Microdissection, LPC: Laser Pressure Catapulting, 2D PAGE: Two-dimensional Polyacrylamide<br />

Gel Electrophoresis, 2D DIGE: Differential In-gel Electrophoresis, SDS: Sodium<br />

From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />

Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />

159


160 Mustafa et al.<br />

Dodecyl Sulphate, MALDI-TOF/MS: Matrix-assisted Laser Desorption/Ionization Timeof-flight<br />

Mass Spectrometry, MALDI-FTMS: Matrix-assisted Laser Desorption/Ionization<br />

Fourier Transformer Mass Spectrometry, LC-ESI-MS/MS: Liquid Chromatography-<br />

Electrospray Ionization Tandem Mass Spectrometry, HPLC: High Performance Liquid<br />

Chromatography, SELDI-TOF: Surface-enhanced Laser Desorption/Ionization Time-offlight,<br />

ICAT: Isotope-coded Affinity Tag<br />

1. Introduction<br />

Over the last years, significant progress in the analysis of the entire genome<br />

has triggered efforts to further analyze normal and abnormal protein expression<br />

patterns. There is, for instance, an eagerness to discover more and better<br />

diagnostic markers for specific diseases. High expectations of the use of better<br />

biomarkers for the purpose of improving diagnosis and monitoring treatment<br />

initiated technical developments. Human tissues are usually composed of rather<br />

complex mixtures of different cell types. Many techniques have been used<br />

for the isolation of pure cell populations and each technique has its advantages<br />

and limitations. For example, immunohistochemistry is an established<br />

and relatively easy technique applicable for localizing protein expression. A<br />

drawback of immunohistochemistry is the impossibility of quantitative assessments<br />

of proteins. Another method to obtain information about particular cell<br />

populations is growing cell cultures in order to amplify target cells. Despite<br />

the technical feasibility of this technique, the biological characteristics of the<br />

original cells may not be so accurate in an in vitro environment (1). Alternatively,<br />

by using xenografts a better mimicking of the normal situation is<br />

reached, but again this method only reflects the real situation of cells in vivo to<br />

some extent (2). Another way of separating cell populations for further investigation<br />

is flow cytometry, which has successfully been applied in the study of<br />

many disease processes. Flow cytometric analysis is applied to cell suspensions<br />

and specific markers for selection of cell population are required. To the best<br />

of our knowledge, the combination of flow cytometry and subsequent mass<br />

spectrometry (MS) has not yet been described for the analysis of solid tissues.<br />

In this review, we discuss methods of cell purification and harvesting<br />

techniques by the use of laser microdissection, which are currently applied for<br />

further MS analysis.<br />

2. Laser Capture Microdissection<br />

In order to select for specific cell populations in heterogeneous tissues,<br />

several microdissection techniques have been described. Most techniques<br />

involve the use of a needle to scrap off cells of interest under direct microscopic


Combining LCM and Proteomics Techniques 161<br />

visualization (3,4). This method, however, tends to be slow, tedious, and highly<br />

operator dependent (2). In 1992, Shibata and coworkers described a new method<br />

of cell isolation. They used a specific pigment placed over small numbers of<br />

cells in a tissue section, which served as an umbrella preventing the covered<br />

cells of being destroyed. Ultraviolet light was used to destroy the DNA/RNA<br />

of the uncovered cells (5). Shortly later, laser capture microdissection (LCM)<br />

under direct microscopic visualization was developed by Liotta and coworkers<br />

in the National Cancer Institute. This way of target cell isolation permits rapid,<br />

reliable laser microdissection to collect specific cell populations from a section<br />

of a complex, heterogeneous tissue (6). For this approach, a tissue section<br />

is placed in a holder of an inverted microscope. A transparent, thermoplastic<br />

polymer coating [e.g., ethylene vinyl acetate (EVA) (7)] is placed in contact<br />

with the tissue. The EVA polymer is positioned over microscopically selected<br />

cell clusters and subsequently the polymer is precisely activated by a nearinfrared<br />

laser pulse steered by the investigator. The laser activation of the<br />

polymer results in specific binding to the targeted area. With the removal of<br />

the EVA and the tissue that was bound to it from the section the selected cell<br />

aggregates are isolated for molecular analysis (8). LCM is compatible with<br />

a variety of cellular staining methods and tissue preservation protocols (9).<br />

Dependent on the microlaser dissection device used, the collection caps used are<br />

positioned in different ways. For instance, the caps in the PixCell II (Arcturus<br />

Engineering, Mountain <strong>View</strong>, CA, USA) technique make contact with the<br />

tissue sections, therefore, strict requirements for preparations are needed. The<br />

PALM microlaser dissector (PALM Microlaser Technologies AG, Bernried,<br />

Germany) provides a powerful separation in which an important application of<br />

the cutting UV-laser is laser microbeam microdissection combined with laser<br />

pressure catapulting (10). A specific glass slides covered with polyethylene<br />

naphthalate membrane will aid in stabilizing the morphological integrity of<br />

the captured area (11) (Fig. 1). In this method, collecting caps do not make<br />

any contact with the tissue sections anymore, which increase the flexibility in<br />

respect to section preparation (12). Both LCM techniques are specific enough<br />

to dissect single cells. The PALM can dissect smaller sections of tissue as<br />

compared to the PixCell system. The two methods of microdissection yield<br />

RNA retrievals of comparable quality and quantity, but they have not been<br />

directly compared with regard to recent developments in protein retrieval for<br />

mass spectrometric applications (13). The collection of large quantities of cells<br />

by LCM is a time consuming procedure requiring the microscopical visualization<br />

of the cells of interest in a stained tissue sections before lasering. The<br />

software and the hardware of the different types of laser microdissection are<br />

still developing.


162 Mustafa et al.<br />

Buffer droplet<br />

Microdissected tissue<br />

Cap<br />

PEN membrane<br />

Stage<br />

Tissue section<br />

Slide<br />

Laser<br />

objective<br />

Fig. 1. A scheme that represents the principle of laser capture microdissection.<br />

3. LCM and Two-Dimensional Gel Electrophoresis<br />

A new development is the application of LCM for protein retrieval of<br />

tissues for further analysis by proteomic techniques. So far, several approaches<br />

have been performed on cells obtained by laser microdissection. In 2000,<br />

Emmert-Buck and coworkers applied two-dimensional polyacrylamide gel<br />

electrophoresis (2D PAGE) to 50,000 microdissected epithelial cells (14). They<br />

compared tumor cells and normal controls from two patients with oesophageal<br />

cancer (14). Staining the gels with silver yielded the visualization of 675 distinct<br />

proteins and isoforms. Seventeen differentially expressed spots were further<br />

analyzed by MS. This resulted in the identification of two specific proteins,<br />

cytokeratin 1 and annexin I. It was assumed that these proteins were present<br />

in an abundance range of 50,000–1,000,000 copies per cell (14). Using colon<br />

cancer as a model, also Lawrie and coworkers showed the feasibility of investigating<br />

protein expression by combining the technologies of LCM and proteome<br />

analysis like 2D PAGE and MS (15).<br />

To overcome the limitation of LCM in producing relatively low numbers of<br />

cells, an extra step has been added to the separation method. In addition to the<br />

2D PAGE from the microdissected cells, an extra 2D PAGE from the whole<br />

section of the same set of samples can be useful. The comparison of silver<br />

stained 2D gels created from microdissected epithelial cells of ovarian cancer<br />

and the 2D gels created from the whole section of the same ovarian samples,<br />

facilitated the discovery of 23 differentially expressed proteins between low<br />

malignant potential and invasive ovarian cancers (16). In-gel digestion of the<br />

specific gel spots followed by MS/MS analysis resulted in the identification of<br />

glyoxalase I, RhoGDI, and a 52 kDa FK506 binding protein (16). In another<br />

study based on 2D PAGE, 315 protein spots were identified by collecting<br />

100,000 cells by LCM of normal and cancer ductal units from breast tissue


Combining LCM and Proteomics Techniques 163<br />

sections (17). Subsequent measurement of the spots by MS resulted in the<br />

identification of 57 differentially expressed proteins between the two groups of<br />

samples (17).<br />

The relative low number of microdissected cells emphasizes the importance<br />

of loading equivalent amounts of protein on the gels. Thus, Shekouh and<br />

coworkers (18) followed a strategy to increase the accuracy of 2D PAGE from<br />

LCM samples. The samples were first separated by one-dimensional sodium<br />

dodecyl sulphate (SDS)-PAGE, stained with silver and subsequently subjected<br />

to densitometry. Evaluation of the staining intensity was used to normalize<br />

the samples. The 2D PAGE silver stained images from 50,000 microdissected<br />

adenocarcinoma cells were compared with the images from whole sections of<br />

pancreatic samples. Spots of their interest were subjected to MALDI-TOF/TOF<br />

MS, resulting in the identification of S100A6 as an over-expressed protein in<br />

pancreatic cancer cells (18). The same methodology has been used to understand<br />

the mechanism of a specific molecule such as (HER-2/neu) in breast<br />

cancer (19). Breast cancer tissue was used to microdissect about 50,000–70,000<br />

cells from three HER-2/neu-positive tumors and three HER-2/neu-negative<br />

tumors. This lead to the detection of about 500–600 protein spots in each<br />

gel. The comparison of these two groups allowed the identification of cytokeratin<br />

19 (CK19) as an overexpressed protein in HER-2/neu-positive breast<br />

cancer patients (19). In another study, the 2D PAGE of 10,000 microdissected<br />

cells of hepatocellular carcinoma (HCC) samples was compared with normal<br />

surrounding tissue. The investigators visualized about 868 spots of which 20<br />

were considered as differentially expressed proteins. The digestion of these<br />

proteins into peptides was followed by the application of ESI-MS/MS, which<br />

allowed the identification of 11 proteins. Four out of these 11 proteins were<br />

considered as novel candidates of hepatitis B-related HCC markers (20). This<br />

approach of separating the microdissected cells on 2D PAGE followed by in-gel<br />

protein digestion and MS measurements for the identification of biomarkers has<br />

been applied to a wide range of cancers, using various numbers of microdissected<br />

cells. There is a range of 10,000–100,000 cells harvested by LCM for<br />

the successful application of 2D electrophoresis (Table 1) .<br />

4. LCM and Differential In-Gel Electrophoresis<br />

In 2002, Zhou and coworkers described a new technique called differential<br />

in-gel electrophoresis (DIGE) (21). Two pools of proteins are labeled<br />

with 1-(5-carboxypentyl)-1-propylindocarbocyanine halide (Cy3) N-hydroxysuccinimidyl<br />

ester and 1-(5-carboxypentyl)-1-methylindodi-carbocyanine<br />

halide (Cy5) N-hydroxy-succinimidyl ester fluorescent dyes (21). The labeled<br />

proteins are mixed and separated in the same 2D gel. This strategy improves


Table 1<br />

Overview of Different Methods to Combine Laser Microdissection and Different Proteomics Techniques<br />

Separation<br />

technique<br />

Number of<br />

microdissected<br />

cells/sample<br />

Number of<br />

visualized<br />

proteins<br />

Identification<br />

technique<br />

Number of<br />

significant<br />

differentially<br />

identified proteins<br />

Number of<br />

samples/study<br />

Tissue<br />

used<br />

2D PAGE,<br />

silver<br />

staining<br />

2D PAGE,<br />

silver<br />

staining<br />

50,000 Approximately<br />

675 distinct<br />

proteins<br />

including<br />

isoforms<br />

1–5 μg of total<br />

cellular protein<br />

Mass spectrometry<br />

and immunoblot<br />

analysis<br />

Not determined Mass spectrometry<br />

data from all the<br />

protein spots cut<br />

from the gels<br />

n = 2; cytokeratin<br />

1 and annexin I<br />

n = 3; cytokeratin<br />

8, cytokeratin 18,<br />

and -actin<br />

2 cancer samples<br />

and 2 normal<br />

samples<br />

2 cancer samples<br />

and 2 normal<br />

samples<br />

Esophageal<br />

cancer<br />

Colon<br />

cancer<br />

2D PAGE,<br />

silver<br />

staining<br />

50,000 23 differentially<br />

expressed<br />

proteins were<br />

discussed<br />

ESI-MS<br />

identification from<br />

gels made of whole<br />

sections<br />

n = 3; FK506<br />

binding protein,<br />

glyoxalase I, and<br />

RhoGDI<br />

3 invasive OV<br />

and 2 noninvasive<br />

(LMP) OV<br />

Ovarian<br />

cancer<br />

2D PAGE,<br />

silver<br />

staining<br />

2D PAGE,<br />

silver<br />

staining<br />

100,000 315 protein spots MS identification<br />

from gels made of<br />

whole sections<br />

n = 57 observed<br />

proteins. n =2<br />

after confirmation<br />

50,000 800 protein spots MALDI-TOF/TOF n =1;<br />

calcium-binding<br />

protein, S100A6<br />

6 samples of<br />

DCIS and 6<br />

samples of normal<br />

ductal/lobular<br />

units<br />

4 cancer samples<br />

and 4 normal<br />

samples<br />

Breast<br />

cancer<br />

Pancreas<br />

cancer<br />

Reference<br />

(14)<br />

(15)<br />

(16)<br />

(17)<br />

(18)<br />

164


2D PAGE,<br />

silver<br />

staining<br />

2D PAGE,<br />

silver<br />

staining<br />

2D DIGE,<br />

lysine<br />

specific<br />

dyes<br />

2D DIGE,<br />

lysine<br />

specific<br />

dyes<br />

2D DIGE,<br />

lysine<br />

specific<br />

dyes<br />

50,000–70,000 500–600 protein<br />

spots<br />

MALDI-TOF mass<br />

spectrometer<br />

10,000 868 protein spots Nano-flow<br />

ESI-MS/MS<br />

250,000 1038–1088<br />

protein spots<br />

Capillary LC<br />

tandem mass<br />

analysis<br />

30,000 1200 protein<br />

spots<br />

MALDI-TOF<br />

measurements<br />

50,000 Not applicable MALDI-TOF<br />

and/or<br />

immunoblotting<br />

for protein<br />

identification<br />

n =7;<br />

cytokeratin19,<br />

tropomyosin 3,<br />

aldolase A,<br />

glyoxalase I,<br />

cathepsin D chain<br />

3, albumin, and<br />

MnSOD<br />

3 HER-2/neupositive<br />

samples<br />

and 3 HER-<br />

2/neu-negative<br />

samples<br />

n = 11 proteins,<br />

four of them were<br />

novel markers<br />

10 hepatic cancer<br />

cells samples<br />

n = 1; tumor<br />

rejection antigen<br />

(gp96)<br />

One sample<br />

contained normal<br />

and one sample<br />

contains cancer<br />

cells<br />

No further<br />

identifications<br />

One sample<br />

contained gastric<br />

mucosa and one<br />

SPEM<br />

n = 32 Five samples<br />

contained<br />

malignant and<br />

normal breast<br />

tissue<br />

HER-<br />

2/neupositive<br />

breast<br />

cancer<br />

cells<br />

Hepatic<br />

cancer<br />

cells.<br />

hepatitis B<br />

positive<br />

cells<br />

Esophageal<br />

carcinoma<br />

Gastric<br />

metaplasia<br />

samples<br />

Breast<br />

epithelium<br />

cell<br />

(19)<br />

(20)<br />

(21)<br />

(22)<br />

(23)<br />

Continued<br />

165


Table 1<br />

Continued<br />

Separation<br />

technique<br />

2D DIGE,<br />

cysteine<br />

specific<br />

dyes<br />

2D DIGE,<br />

cysteine<br />

specific<br />

dyes<br />

(IPG-IEF)<br />

2D-PAGE<br />

gel<br />

(IPG-IEF)<br />

2D-PAGE<br />

gel<br />

Number of<br />

microdissected<br />

cells/sample<br />

Number of<br />

visualized<br />

proteins<br />

Identification<br />

technique<br />

Number of<br />

significant<br />

differentially<br />

identified proteins<br />

Number of<br />

samples/study<br />

5000 ∼1000 protein<br />

spots<br />

MALDI-MS<br />

and MS/MS<br />

measurements<br />

n = 40 cultured oncogenetransduced<br />

epithelial cells and<br />

precancerous<br />

versus cancerous<br />

tissue<br />

Between 100<br />

and 10<br />

glomeruli,<br />

which equals<br />

to 0.5–3 μg<br />

protein<br />

Between 1400<br />

and 900 protein<br />

spots<br />

Nano<br />

LC-ESI-MS/MS<br />

n = 23 between<br />

mice glomeruli<br />

and mice cortex<br />

3 different protein<br />

extracts from<br />

human glomeruli<br />

and 3 independent<br />

isolated glomeruli<br />

and cortex from 3<br />

mice<br />

Proteins,<br />

3.8 μg<br />

Not applicable Mass spectrometry n = 29 2 samples<br />

contained renal cell<br />

carcinoma and<br />

normal kidney<br />

tissues<br />

Approximately<br />


HPLC<br />

system<br />

16 O/<br />

18 O<br />

isotopic<br />

labeling<br />

peptides<br />

Gel-free<br />

method<br />

Gel-free<br />

method<br />

Gel-free<br />

method<br />

10,000 Not applicable ESI mass<br />

spectrometry<br />

followed by<br />

MS/MS<br />

n = 9 3 slides from the<br />

same cell culture<br />

10,000 Not applicable The reverse phase<br />

of LC-ESI-MS/MS<br />

on the ion trap<br />

mass spectrum<br />

n = 76 2 samples with<br />

invasive ductal<br />

carcinoma of the<br />

breast<br />

30,000–50,000 Not applicable SELDI-TOF/MS n = 1; prostate<br />

carcinomaassociated<br />

protein<br />

(PCa-24)<br />

17 prostate<br />

carcinoma that<br />

contained normal<br />

tissue and BPH<br />

tissue and 7 BPH<br />

samples<br />

∼2000 Not applicable MALDI-TOF/MS n = 2; calgranulin<br />

A and chaperonin<br />

10<br />

8 endometrioid<br />

adenocarcinomas,<br />

4 proliferative<br />

endometria, and<br />

4 secretory<br />

endometria<br />

150 Not applicable MALDI-TOF/MS No protein<br />

identifications.<br />

Unique peptide<br />

pattern of ∼35<br />

peptides for<br />

trophoblast and<br />

stroma cells<br />

1 placenta sample<br />

contained<br />

trophoblasts and<br />

surrounding<br />

stroma cells.<br />

Breast<br />

cancer cell<br />

line<br />

(SKBR-3)<br />

Ductal<br />

carcinoma<br />

of the<br />

breast<br />

Prostate<br />

cancer<br />

Endometrial<br />

cancer<br />

Placenta<br />

samples<br />

(34)<br />

(29)<br />

(41)<br />

(36)<br />

(37)<br />

Continued<br />

167


Table 1<br />

Continued<br />

Separation<br />

technique<br />

Number of<br />

microdissected<br />

cells/sample<br />

Number of<br />

visualized<br />

proteins<br />

Identification<br />

technique<br />

Number of<br />

significant<br />

differentially<br />

identified proteins<br />

Number of<br />

samples/study<br />

Tissue<br />

used<br />

Reference<br />

Gel-free<br />

method<br />

2000–2400 Not applicable MALDI-TOF/TOF<br />

mass spectrometry<br />

No protein<br />

identifications. 9<br />

differentially<br />

expressed peptides<br />

6 invasive ductal<br />

breast carcinoma<br />

contained cancer<br />

and normal cells<br />

Breast<br />

cancer<br />

(38)<br />

Gel-free<br />

method<br />

3000 Not applicable Nano LC-FTICR<br />

mass spectrometry<br />

n = 1003 proteins<br />

identified<br />

2 replicate samples<br />

of breast cancer<br />

epithelial cells<br />

Breast<br />

cancer<br />

Umar<br />

et al.,<br />

2006<br />

ProteinChip<br />

technology<br />

3000–5000 Not applicable Isolation by<br />

two-dimensional<br />

gel electrophoresis<br />

and tandem mass<br />

spectrometry<br />

analysis<br />

n = 1; annexin V 57 head and neck<br />

tumor samples and<br />

44 mucosa samples<br />

Head and<br />

nick<br />

cancer<br />

(40)<br />

ProteinChip<br />

technology<br />

3000–5000 Not applicable Isolation by<br />

reverse-phase<br />

chromatography<br />

and SDS-PAGE<br />

then identified by<br />

MS/MS analysis<br />

n = 1; heat shock<br />

protein 10<br />

39 colorectal tumor<br />

samples, 40 normal<br />

mucosa samples,<br />

and 29 adenoma<br />

samples<br />

Colorectal<br />

cancer<br />

(39)<br />

Abbreviations: 2DE: 2 dimensional gel electrophoresis, OV: ovarian cancer, LMP: low malignant potential, DCIS: ductal/lobular units<br />

and ductal carcinoma in situ, HCC: hepatocellular carcinoma, BPH: benign prostatic hyperplasia, SPEM: spasmolytic polypeptide expressing<br />

metaplasia, PR: progesterone receptor, ER: estrogen receptor<br />

168


Combining LCM and Proteomics Techniques 169<br />

the sensitivity of detection and enlarges the range of candidate proteins<br />

for detection. Molecular weight- and charge-matched cyanine dyes enable<br />

multiplex labeling with different samples run on the same gel. The same investigators<br />

described a powerful tool for the molecular characterization of cancer<br />

progression and identification of cancer-specific protein markers by combining<br />

2D DIGE with MS. They compare the 2D DIGE of about 250,000 microdissected<br />

cells from oesophageal carcinoma with normal epithelial cells from<br />

the oesophagus. The cancer cell lysate yielded 1038 protein spots while the<br />

normal epithelial lysate yielded 1088 protein spots. In-gel digestion of the<br />

differentially expressed protein spots was followed by capillary high performance<br />

liquid chromatography (HPLC) tandem mass analysis to achieve further<br />

identification. This way, tumor rejection antigen (gp96) was found to be<br />

upregulated in oesophageal squamousal cell cancer (21). Applying the same<br />

procedure to smaller numbers of microdissected cells from biopsy samples<br />

with gastric metaplasia appeared to be successful as well (22). Approximately<br />

1200 spots were identified from 30,000 microdissected cells. Twenty-eight of<br />

these spots were over expressed in the metaplasia samples as compared to<br />

the normal surface cells (22). However, subsequent MALDI-TOF measurements<br />

of the spots did not result in the identification of proteins. The same<br />

procedure was applied to 50,000 microdissected cells resulting in the identification<br />

of 32 proteins in breast epithelial cancer cells (23), of which thirteen<br />

had not been associated previously with the tumors (23). One technical aspect<br />

of the 2D DIGE method needs special attention: the nature of the fluorescent<br />

dyes and their ability to bind to lysine residues only (21). Proteins with high<br />

percentages of lysine residues can be labeled more efficiently as compared to<br />

proteins containing little or no lysine. By developing a new generation of dyes<br />

reacting with cysteine residues, the sensitivity of DIGE has been improved (24).<br />

Although cysteine is less abundant than lysine in proteins in general, cysteine<br />

labeling can be carried to saturation. Lysine labeling must be limited to 1–3%<br />

of all the residues to prevent loss of solubility when bulky hydrophobic dyes<br />

are coupled to the polar lysine residues (24). Greengauz-Roberts and coworkers<br />

applied the saturated labeling for cysteine residues to study about 5000 cells<br />

obtained by LCM of metaplasia and cancer cells. A total of 1471 distinct protein<br />

features were observed from the relatively small number of cells. Ninety-six of<br />

these spots were further identified. Using MALDI-MS and MS/MS measurements<br />

in addition to the specific position of the protein in the gel resulted in the<br />

identification of 42 proteins in cancer samples (25). Also Sitek and coworkers<br />

described a novel approach to analyze glomerular proteins from mice and<br />

human samples using DIGE saturation labeling (26). Only 10 glomeruli (0.5 μg)<br />

picked by LCM from a slide of a human kidney biopsy appeared to be sufficient<br />

to visualize 900 spots using DIGE technique (26). 2D DIGE holds several


170 Mustafa et al.<br />

advantages over the conventional 2D gel. One of the most important advantages<br />

is the improvement of the reproducibility of 2D DIGE method. The gel-to-gel<br />

differences are minimalized because the separation of the pooled samples takes<br />

place in the same gel. Therefore, the comparison of protein expression from<br />

two cell populations or samples can be more accurately assessed and easier to<br />

be identified. The quantitative differences of protein contents are also better<br />

measured by the application of fluorescent dyes. In addition, 2D DIGE enables<br />

a higher throughput analysis of 2D gels by its feasibility to automatic gel<br />

imaging. Importantly, labeling of proteins by fluorescent dyes did not affect the<br />

protein identification by MS, because only small percentages of the molecules<br />

of each protein are labeled. Importantly, for 2D DIGE the number of microdissected<br />

cells, which are required for protein identification is less as compared<br />

to the other 2D electrophoresis techniques (Table 1).<br />

5. LCM and Different Labeling Techniques<br />

The comparison of the proteome of two different samples (for instance,<br />

normal and tumor cells) is facilitated by labeling. In 2004, Li and coworkers<br />

described a method for qualitative and quantitative protein analysis by<br />

combining LCM with isotope-coded affinity tag labeling technology and twodimensional<br />

liquid chromatography coupled with tandem mass spectroscopy<br />

(2D-LC-MS/MS) (27). Approximately 50,000–100,000 cells of HCC and<br />

nonHCC hepatocytes were microdissected and a total of 644 proteins in<br />

HCC hepatocytes were qualitatively determined, and 261 differential proteins<br />

between the two groups were quantified (28). In 2004, 16 O/ 18 O isotopic labeled<br />

peptides were generated from 10,000 microdissected cells of ductal carcinoma<br />

of the breast. The approach allowed the identification of 76 proteins (29).<br />

By using reverse phase liquid chromatography-electrospray ionization tandem<br />

mass spectrometry (LC-ESI-MS/MS) Zang and coworkers were able to identify<br />

proteins that were significantly upregulated in the breast tumor cells (29).<br />

Separating the radioactive labeled peptides on the high resolution 54 cm serial<br />

immobilized pH gradient isoelectric focusing 2D-PAGE gel provided a precise<br />

estimate of the abundance ratio for proteins from two samples (30). The radioiodination<br />

of 3.8 μg renal carcinoma proteins and 3.8 μg normal kidney proteins<br />

with both 125 I and 131 I followed by mass spectrometric identification revealed<br />

29 differentially expressed proteins (30). Applying the same methodology of<br />

radioactive labeling to a pool of microdissected breast cancer cells provided<br />

a sensitive method to identify some differentially expressed proteins in correlation<br />

with the presence of progesterone receptor in estrogens receptor-positive<br />

breast cancer (31).


Combining LCM and Proteomics Techniques 171<br />

6. Combining LCM and Different Separation Methods<br />

It has been shown previously that the number of detected and identified<br />

peptides and proteins increases significantly by coupling MALDI-MS (32)<br />

and ESI-MS (33) to a peptide or protein separation system. In 2003, Wu and<br />

coworkers described a method for discovering biomarkers from microdissected<br />

homogeneous cells from breast cancer cell lines (34). Following capturing<br />

the cells, the peptide digest was fractionated by reversed phase HPLC and<br />

analyzed by ion trap MS (34). HPLC fractionation of about 10,000 endothelial<br />

cells from a breast cancer cell line (SKBR-3) followed by ESI MS resulted<br />

in the identification of low-expressed proteins in the cell line. Capillary<br />

isoelectric focusing combined with the reverse phase nano-LC in an automated<br />

and integrated platform provides systematic resolution of complex peptide<br />

mixtures generated from limited protein quantities (7). This method separated<br />

the mixture of peptides based on differences in isoelectric points and hydrophobicity,<br />

and it eliminates peptide loss and analyte dilution (7). This method<br />

of separation coupled to ESI-tandem MS assists in the detection of 6866<br />

peptides, leading to the identification of 1820 proteins from 20,000 microdissected<br />

cells of glioblastoma (7). In order to increase the number of identified<br />

proteins from LCM of brain samples, Gozal and coworkers added an extra<br />

separation step (35). After collecting cells by LCM, the total protein were<br />

extracted and resolved on an SDS gel. Gels were cut out into multiple pieces<br />

followed by trypsin digestion. Peptides were subjected to highly sensitive liquid<br />

chromatography-tandem mass spectrometry (LC-MS/MS). This way resulted<br />

in identifying hundreds to thousands of proteins (35).<br />

7. LCM and Gel-Free Mass Spectrometry<br />

There are possibilities of measuring the peptide digest of cells harvested by<br />

LCM directly by MS, without an initial separation step on 2D PAGE (known as<br />

“gel-free MS”). Guo and coworkers directly analyzed endometrial epithelium<br />

cells obtained by LCM using matrix-assisted laser desorption/ionization timeof-flight<br />

mass spectrometry (MALDI-TOF/MS) (36). A total of 16 physiologic<br />

and malignant endometrial samples including four proliferative and four<br />

secretory endometria, and eight endometrioid adenocarcinomas were used for<br />

this study. Approximately 2000 cells appeared to be sufficient to confirm<br />

overexpression of two proteins, calgranulin A and chaperonin 10 in the<br />

epithelial cells of endometrial adenocarcinoma samples (36). In another study,<br />

the direct analysis of 125 trophoblast and stroma cells of placental tissue resulted<br />

in the detection of significant expressed protein differences between these two<br />

cell types (37). Also, differentially expressed proteins between breast cancer<br />

and normal samples can be detected by direct MALDI-TOF/MS measurements


172 Mustafa et al.<br />

of 2000–2400 LCM cells (38). In a recent study, it was possible to identify<br />

over 1000 proteins from 3000 microdissected cells by the combination of<br />

advanced nanoLC and high resolution Fourier transformer mass spectrometry<br />

(FTMS) (39).<br />

8. LCM and Protein Chip Technology<br />

There are currently two approaches to produce arrays capable of generating<br />

protein network information. The first method is the forward phase array in<br />

which each spot on the slide represents a specific antibody. Therefore, the array<br />

is incubated with only one test sample (9). The second method is the reverse<br />

phase array in which each spot represents an individual test sample, and the<br />

array is composed of multiple, different samples, which then can be tested<br />

under the same experimental conditions. In addition, when the arrays are probed<br />

separately with two different classes of antibodies, it is possible to specifically<br />

detect the total and phosphorylated forms of the protein of interest (9). By<br />

combining LCM technique to protein chip technology, Melle and coworkers<br />

identified annexin V as a specific protein in head and neck cancer patients,<br />

and heat shock protein 10 as a biomarker in colorectal cancer patients (40,41).<br />

The protein lysates from 3000 to 5000 microdissected cells were analyzed on<br />

both strong anion exchange arrays and weak cation exchange arrays, followed<br />

by separation steps (e.g., 2D gel or reverse phase chromatography and SDS-<br />

PAGE), MS measurements, and MS/MS analysis (40,41). In both cases, a<br />

validation step by immunohistochemistry confirmed their findings.<br />

In other studies surface-enhanced laser desorption/ionization time-of-flight<br />

analysis was applied to microdissected cells because of its sensitivity to<br />

smaller amounts of material than other techniques such as 2D gel (42). Using<br />

30,000–50,000 cells of prostate carcinoma specimens, the unique expression<br />

of prostate carcinoma-associated protein, called PCa-24 in the epithelial cells,<br />

was reached (42). Protein microarrays hold several technical challenges (43).<br />

Their application offers the advantage of scalability, flexibility, and automatic<br />

processing (43). Arrays may also enable the control of key parameters such as<br />

temperature, pH, and cofactor concentration, which are not easily afforded by<br />

cell-based systems.<br />

9. Perspectives of LCM and Mass Spectrometry Analysis<br />

The use of LCM of (relatively) pure populations of cells to be used for<br />

further analysis of their proteome is an important addition to the arsenal of<br />

techniques in bioscience. However, this technique is still time consuming and<br />

yield relatively small numbers of cells. To overcome this problem, alternative


Combining LCM and Proteomics Techniques 173<br />

Intens.<br />

×10 7<br />

1994.98513<br />

Intens.<br />

×10 6<br />

1.0<br />

1726.89642<br />

1793.73840<br />

1891.97950<br />

2025.94879<br />

1999.99082<br />

0.8<br />

1818.99943<br />

1943.95115<br />

1840.98089 1873.94999<br />

fibrinogen<br />

1.5<br />

0.6<br />

GAPDH<br />

1859.95483<br />

1978.96298<br />

1963.92507<br />

1475.75278<br />

0.4<br />

CD34 antigen<br />

0.2<br />

1.0<br />

1277.71354<br />

0.0<br />

1700 1750 1800 1850 1900 1950 2000 m/z<br />

+MS<br />

0.5<br />

GFAP<br />

1707.77693<br />

fibrinogen<br />

2151.08736<br />

2368.27262<br />

2511.14239<br />

Tubulin<br />

Hb<br />

2706.17286<br />

alpha 2<br />

3265.53235<br />

2903.42238<br />

0.0<br />

1000 1500 2000 2500 3000 3500 m/z<br />

+MS<br />

Fig. 2. MALDI FTMS spectrum obtained from 150 microdissected cells from a<br />

frozen glioma tissue sample. The spectrum contains approximately thousand monoisotopic<br />

peaks between 700 and 3000 m/z at relative high peak intensities. The small box<br />

is a zoom in for a small part of the spectra, between 1700 and 2000 m/z. It shows the<br />

very high numbers of peaks obtained from measuring a very small number of cells.<br />

The peaks can be identified by different sequencing MS techniques; some examples of<br />

identified peptides are indicated in the spectrum.<br />

steps of processing tissues are needed. Sample collection and preparation is<br />

crucial. During the microdissection procedure, special attention should be taken<br />

to prevent waist and contamination of target material. For instance, material<br />

should not drop from, or stick to, the cap of the tubes used. Another consideration<br />

is to minimize the steps of transferring the collected material from one<br />

tube into the other. Therefore, the use of low protein binding tubes is recommended.<br />

A protocol for sample preparation is included in this chapter (Box 1).<br />

The 2D PAGE is a well-established technique that had been used in combination<br />

with LCM in many studies so far. The need of relative large numbers of<br />

cells blocks the possibility to measure large numbers of samples as indicated<br />

in Table 1. In addition, the relative low reproducibility hampers sound statistical<br />

analysis. 2D DIGE improves reproducibility and also lowers the required<br />

amount of microdissected tissue. However, this technique is suitable for experimental<br />

research only.


174 Mustafa et al.<br />

LCM sample preparation protocol:<br />

Cryosections of 8 μm were made from glioma braintumor tissue and<br />

mounted on polyethylene naphthalate covered glass slides (PALM Microlaser<br />

Technologies AG, Bernried, Germany) as described previously (38). The<br />

slides were fixed in 70% ethanol and stored at (–20 (C for not more than 2<br />

days. After fixation and immediately before microdissection, the slides were<br />

washed twice with Milli-Q water, stained for 10 s in haematoxylin, washed<br />

again twice with Milli-Q water and subsequently dehydrated in a series of 50,<br />

70, 95, and 100% ethanol solution and air dried. The PALM laser microdissection<br />

and pressure catapulting device, type P-MB was used with PalmRobo<br />

v2.2 software at 40× magnification. Estimating that a cell has a volume of<br />

10 × 10 × 10 μm, we microdissected an area of about 190,000 μm 2 of blood<br />

vessels and another area of the same size of the surrounding tumor tissue from<br />

each sample, resulting in approximately 1500 cells per sample. The microdissected<br />

cells were collected in caps of PALM tubes in 5 μl of 0.1% RapiGest<br />

buffer (Waters, Milford, MA, USA). The caps were cut and placed onto<br />

0.5 ml Eppendorf protein LoBind tubes (Eppendorf, Hamburg, Germany).<br />

Subsequently, these tubes were centrifuged at 12,000 g for 5 min. To make<br />

sure that all the cells were covered with buffer, another 5 μl of RapiGest<br />

was added to the cells. All samples were stored at –80°C. After thawing<br />

the microdissected tissue, the tissue was disrupted by external sonification<br />

for 1 min at 70% amplitude at a maximum temperature of 25°C (Bransons<br />

Ultrasonics, Danbury, USA). The samples were incubated at 37 and 100°C<br />

for 5 and 15 min, respectively, for protein solubilization and denaturation.<br />

To each sample, 1.5 μl of 100 ng/μl gold grade trypsin (Promega, Madison,<br />

WI, USA) in 3 mM Tris–HCL diluted 1:10 in 50 mM NH 4 HCO 3 was added<br />

and incubated overnight at 37°C for protein digestion. To inactivate trypsin<br />

and to degrade the RapiGest, 2 μl of 500 mM HCL was added and incubated<br />

for 30 min at 37°C. Samples were dried in a Speedvac (Thermo Savant,<br />

Holbrook, NY, USA) and reconstituted in 5 μl of 50% acetonitrile/0.5% trifluoroacetic<br />

acid/water prior to measurement. Samples were used for immediate<br />

measurements, or stored for a maximum of 10 days at 4°C.<br />

Recently, the improvement of resolution and detection limits in modern mass<br />

spectrometers, particularly in FTMS, opened a new research field to analyze<br />

small numbers of microdissected cells (in the range of 200–5000). FTMS<br />

has specific characteristics, unrivalled high mass resolution (in the order of<br />

100,000–1,000,000), high mass accuracy (below 1 ppm), dynamics (three to<br />

four orders of magnitude), and its good signal to noise ratio (44). These features<br />

facilitate combining this technique with LCM. For instance, by MALDI-FTMS,


Combining LCM and Proteomics Techniques 175<br />

peptide digests of no more than 150 cells taken from biological samples (e.g.,<br />

glioma vessel tissue) resulted in informative mass spectra (Fig. 2). It is expected<br />

that techniques like FTMS soon will be implicated in the practice of routine<br />

laboratories for the detection of disease-related proteins in clinical specimens.<br />

References<br />

1. Zhang, L., Zhou, W., Velculescu, V. E., Kern, S. E., Hruban, R. H., Hamilton, S. R.,<br />

Vogelstein, B. and Kinzler, K. W. (1997) Gene expression profiles in normal and<br />

cancer cells. Science 276, 1268–1272.<br />

2. Curran, S., McKay, J. A., McLeod, H. L. and Murray, G. I. (2000) Laser capture<br />

microscopy. Mol Pathol 53, 64–68.<br />

3. Going, J. J. and Lamb, R. F. (1996) Practical histological microdissection for PCR<br />

analysis. J Pathol 179, 121–124.<br />

4. Zhuang, Z., Bertheau, P., Emmert-Buck, M. R., Liotta, L. A., Gnarra, J., Linehan,<br />

W. M. and Lubensky, I. A. (1995) A microdissection technique for archival DNA<br />

analysis of specific cell populations in lesions


176 Mustafa et al.<br />

13. Ball, H. J. and Hunt, N. H. (2004) Needle in a haystack: microdissecting the<br />

proteome of a tissue. Amino Acids 27, 1–7.<br />

14. Emmert-Buck, M. R., Gillespie, J. W., Paweletz, C. P., Ornstein, D. K., Basrur, V.,<br />

Appella, E., Wang, Q. H., Huang, J., Hu, N., Taylor, P. and Petricoin, E. F. 3rd (2000)<br />

An approach to proteomic analysis of human tumors. Mol Carcinog 27, 158–165.<br />

15. Lawrie, L. C., Curran, S., McLeod, H. L., Fothergill, J. E. and Murray, G. I. (2001)<br />

Application of laser capture microdissection and proteomics in colon cancer. Mol<br />

Pathol 54, 253–258.<br />

16. Jones, M. B., Krutzsch, H., Shu, H., Zhao, Y., Liotta, L. A., Kohn, E. C. and<br />

Petricoin, E. F. 3rd (2002) Proteomic analysis and identification of new biomarkers<br />

and therapeutic targets for invasive ovarian cancer. Proteomics 2, 76–84.<br />

17. Wulfkuhle, J. D., Sgroi, D. C., Krutzsch, H., McLean, K., McGarvey, K.,<br />

Knowlton, M., Chen, S., Shu, H., Sahin, A., Kurek, R., Wallwiener, D.,<br />

Merino, M. J., Petricoin, E. F. 3rd, Zhao, Y. and Steeg, P. S. (2002) Proteomics<br />

of human breast ductal carcinoma in situ. Cancer Res 62, 6740–6749.<br />

18. Shekouh, A. R., Thompson, C. C., Prime, W., Campbell, F., Hamlett, J., Herrington,<br />

C. S., Lemoine, N. R., Crnogorac-Jurcevic, T., Buechler, M. W., Friess, H.,<br />

Neoptolemos, J. P., Pennington, S. R. and Costello, E. (2003) Application of laser<br />

capture microdissection combined with two-dimensional electrophoresis for the<br />

discovery of differentially regulated proteins in pancreatic ductal adenocarcinoma.<br />

Proteomics 3, 1988–2001.<br />

19. Zhang, D. H., Tai, L. K., Wong, L. L., Sethi, S. K. and Koay, E. S. (2005)<br />

Proteomics of breast cancer: enhanced expression of cytokeratin19 in human<br />

epidermal growth factor receptor type 2 positive breast tumors. Proteomics 5,<br />

1797–1805.<br />

20. Ai, J., Tan, Y., Ying, W., Hong, Y., Liu, S., Wu, M., Qian, X. and Wang, H. (2006)<br />

Proteome analysis of hepatocellular carcinoma by laser capture microdissection.<br />

Proteomics 6, 538–546.<br />

21. Zhou, G., Li, H., DeCamp, D., Chen, S., Shu, H., Gong, Y., Flaig, M.,<br />

Gillespie, J. W., Hu, N., Taylor, P. R., Emmert-Buck, M. R., Liotta, L. A.,<br />

Petricoin, E. F. 3rd and Zhao, Y. (2002) 2D differential in-gel electrophoresis for<br />

the identification of esophageal scans cell cancer-specific protein markers. Mol<br />

Cell Proteomics 1, 117–124.<br />

22. Lee, J. R., Baxter, T. M., Yamaguchi, H., Wang, T. C., Goldenring, J. R. and<br />

Anderson, M. G. (2003) Differential protein analysis of spasomolytic polypeptide<br />

expressing metaplasia using laser capture microdissection and two-dimensional<br />

difference gel electrophoresis. Appl Immunohistochem Mol Morphol 11, 188–193.<br />

23. Hudelist, G., Singer, C. F., Pischinger, K. I., Kaserer, K., Manavi, M., Kubista, E.<br />

and Czerwenka, K. F. (2006) Proteomic analysis in human breast cancer: identification<br />

of a characteristic protein expression profile of malignant breast epithelium.<br />

Proteomics 6, 1989–2002.<br />

24. Shaw, J., Rowlinson, R., Nickson, J., Stone, T., Sweet, A., Williams, K. and<br />

Tonge, R. (2003) Evaluation of saturation labelling two-dimensional difference gel<br />

electrophoresis fluorescent dyes. Proteomics 3, 1181–1195.


Combining LCM and Proteomics Techniques 177<br />

25. Greengauz-Roberts, O., Stoppler, H., Nomura, S., Yamaguchi, H.,<br />

Goldenring, J. R., Podolsky, R. H., Lee, J. R. and Dynan, W. S. (2005) Saturation<br />

labeling with cysteine-reactive cyanine fluorescent dyes provides increased sensitivity<br />

for protein expression profiling of laser-microdissected clinical specimens.<br />

Proteomics 5, 1746–1757.<br />

26. Sitek, B., Potthoff, S., Schulenborg, T., Stegbauer, J., Vinke, T., Rump, L. C.,<br />

Meyer, H. E., Vonend, O. and Stuhler, K. (2006) Novel approaches to analyse<br />

glomerular proteins from smallest scale murine and human samples using DIGE<br />

saturation labelling. Proteomics 6, 4337–4345.<br />

27. Li, C., Hong, Y., Tan, Y. X., Zhou, H., Ai, J. H., Li, S. J., Zhang, L., Xia, Q. C.,<br />

Wu, J. R., Wang, H. Y. and Zeng, R. (2004) Accurate qualitative and quantitative<br />

proteomic analysis of clinical hepatocellular carcinoma using laser capture<br />

microdissection coupled with isotope-coded affinity tag and two-dimensional liquid<br />

chromatography mass spectrometry. Mol Cell Proteomics 3, 399–409.<br />

28. Gygi, S. P., Rist, B., Gerber, S. A., Turecek, F., Gelb, M. H. and Aebersold, R.<br />

(1999) Quantitative analysis of complex protein mixtures using isotope-coded<br />

affinity tags. Nat Biotechnol 17, 994–999.<br />

29. Zang, L., Palmer Toy, D., Hancock, W. S., Sgroi, D. C. and Karger, B. L. (2004)<br />

Proteomic analysis of ductal carcinoma of the breast using laser capture microdissection,<br />

LC-MS, and 16O/18O isotopic labeling. J Proteome Res 3, 604–612.<br />

30. Poznanovic, S., Wozny, W., Schwall, G. P., Sastri, C., Hunzinger, C.,<br />

Stegmann, W., Schrattenholz, A., Buchner, A., Gangnus, R., Burgemeister, R. and<br />

Cahill, M. A. (2005) Differential radioactive proteomic analysis of microdissected<br />

renal cell carcinoma tissue by 54 cm isoelectric focusing in serial immobilized pH<br />

gradient gels. J Proteome Res 4, 2117–2125.<br />

31. Neubauer, H., Clare, S. E., Kurek, R., Fehm, T., Wallwiener, D., Sotlar, K.,<br />

Nordheim, A., Wozny, W., Schwall, G. P., Poznanovic, S., Sastri, C.,<br />

Hunzinger, C., Stegmann, W., Schrattenholz, A. and Cahill, M. A. (2006)<br />

Breast cancer proteomics by laser capture microdissection, sample pooling, 54-<br />

cm IPG IEF, and differential iodine radioisotope detection. Electrophoresis 27,<br />

1840–1852.<br />

32. Preisler, J., Hu, P., Rejtar, T., Moskovets, E. and Karger, B. L. (2002) Capillary<br />

array electrophoresis-MALDI mass spectrometry using a vacuum deposition<br />

interface. Anal Chem 74, 17–25.<br />

33. Bergstrom, S. K., Samskog, J. and Markides, K. E. (2003) Development<br />

of a poly(dimethylsiloxane) interface for on-line capillary column liquid<br />

chromatography-capillary electrophoresis coupled to sheathless electrospray<br />

ionization time-of-flight mass spectrometry. Anal Chem 75, 5461–5467.<br />

34. Wu, S. L., Hancock, W. S., Goodrich, G. G. and Kunitake, S. T. (2003) An approach<br />

to the proteomic analysis of a breast cancer cell line (SKBR-3). Proteomics 3,<br />

1037–1046.<br />

35. Gozal, Y. M., Cheng, D., Duong, D. M., Lah, J. J., Levey, A. I. and Peng, J. (2006)<br />

Merger of laser capture microdissection and mass spectrometry: a window into the<br />

amyloid plaque proteome. Methods Enzymol 412, 77–93.


178 Mustafa et al.<br />

36. Guo, J., Colgan, T. J., DeSouza, L. V., Rodrigues, M. J., Romaschin, A. D.<br />

and Siu, K. W. (2005) Direct analysis of laser capture microdissected endometrial<br />

carcinoma and epithelium by matrix-assisted laser desorption/ionization mass<br />

spectrometry. Rapid Commun Mass Spectrom 19, 2762–2766.<br />

37. de Groot, C. J., Steegers-Theunissen, R. P., Guzel, C., Steegers, E. A. and<br />

Luider, T. M. (2005) Peptide patterns of laser dissected human trophoblasts<br />

analyzed by matrix-assisted laser desorption/ionisation-time of flight mass<br />

spectrometry. Proteomics 5, 597–607.<br />

38. Umar, A., Dalebout, J. C., Timmermans, A. M., Foekens, J. A. and Luider, T.<br />

M. (2005) Method optimisation for peptide profiling of microdissected breast<br />

carcinoma tissue by matrix-assisted laser desorption/ionisation-time of flight<br />

and matrix-assisted laser desorption/ionisation-time of flight/time of flight-mass<br />

spectrometry. Proteomics 5, 2680–2688.<br />

39. Umar, A., Luider, T. M., Foekens, J. A. and Pasa-Tolic, L. (2007) NanoLC-FT-<br />

ICR Ms improves proteome coverage attainable for approximately 3000 lasermicrodissected<br />

breast carcinoma cells. Proteomics 7, 323–329.<br />

40. Melle, C., Bogumil, R., Ernst, G., Schimmel, B., Bleul, A. and von Eggeling, F.<br />

(2006) Detection and identification of heat shock protein 10 as a biomarker in<br />

colorectal cancer by protein profiling. Proteomics 6, 2600–2608.<br />

41. Melle, C., Ernst, G., Schimmel, B., Bleul, A., Koscielny, S., Wiesner, A.,<br />

Bogumil, R., Moller, U., Osterloh, D., Halbhuber, K. J. and von Eggeling, F.<br />

(2003) Biomarker discovery and identification in laser microdissected head and<br />

neck squamous cell carcinoma with ProteinChip technology, two-dimensional gel<br />

electrophoresis, tandem mass spectrometry, and immunohistochemistry. Mol Cell<br />

Proteomics 2, 443–452.<br />

42. Zheng, Y., Xu, Y., Ye, B., Lei, J., Weinstein, M. H., O’Leary, M. P., Richie, J. P.,<br />

Mok, S. C. and Liu, B. C. (2003) Prostate carcinoma tissue proteomics for<br />

biomarker discovery. Cancer 98, 2576–2582.<br />

43. Cutler, P. (2003) Protein arrays: the current state-of-the-art. Proteomics 3, 3–18.<br />

44. Dekker, L. J., Burgers, P. C., Guzel, C. and Luider, T. M. (2007) Ftms and<br />

TOF/TOF mass spectrometry in concert: identifying peptides with high reliability<br />

using matrix prespotted MALDI target plates. J Chromatogr B Analyt Technol<br />

Biomed Life Sci 847, 62–64.<br />

45. Mustafa, D. A., Burgers, P. C., Dekker, L. J., Charif, H., Titulaer, M. K.,<br />

Smitt, P. A., Luider, T. M. and Kros, J. M., (2007) Identification of glioma<br />

neovascularization-related proteins by using MALDI-FTMS and nano-LC fractionation<br />

to microdissected tumor vessels. Mol Cell Proteomics 6, 1147–1157.


III<br />

Clinical Proteomics by LC-MS Approaches


10<br />

Comparison of Protein Expression by Isotope-Coded<br />

Affinity Tag Labeling<br />

Zhen Xiao and Timothy D. Veenstra<br />

Summary<br />

Isotope-coded affinity tag (ICAT) labeling, in combination with mass spectrometry<br />

(MS), has been widely adopted as an effective method for comparing protein abundance<br />

levels. This chapter describes the ICAT labeling procedure in search for the celecoxibregulated<br />

proteins in a colon cancer cell line. Celecoxib, a cyclooxygenase-2 (COX-2)<br />

specific inhibitor, is used as a colorectal cancer preventative drug in clinical trials. Here,<br />

celecoxib is used to inhibit the expression of COX-2 in a colon cancer cell line HT-29.<br />

To elucidate the proteomic changes induced by celecoxib, the protein lysates from the<br />

treated and control cells are prepared. The cysteine-containing proteins are labeled with the<br />

heavy and light ICAT reagents, respectively. The labeled proteins are then combined and<br />

digested with trypsin. The ICAT-labeled peptides are subject to the purification through<br />

an avidin column and eventually the cleavage of the biotin tags. This chapter focuses on<br />

the ICAT labeling procedure itself, because sample preparation is the most critical step of<br />

an ICAT-based protein expression comparison experiment. Other related procedures such<br />

as the cation exchange high performance liquid chromatography separation of peptides<br />

and MS analysis are detailed elsewhere in this book.<br />

Key Words: isotope-coded affinity tags; quantitative proteomics; mass spectrometry.<br />

1. Introduction<br />

The application of mass spectrometry (MS) has rapidly expanded from<br />

simple identification of protein components to the quantitative comparison<br />

of proteomic changes under various biological and physiological conditions<br />

(1,2,3). In many studies, it is desirable to identify proteins and quantify their<br />

From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />

Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />

181


182 Xiao and Veenstra<br />

levels simultaneously using MS. While the ability to target specific molecules<br />

for quantitation is well established, there are experimental and technical issues<br />

that limit the accuracy of direct quantitation of hundreds (or thousands) of<br />

species in a single MS experiment and make it extremely challenging (4,5,6,7).<br />

To resolve this hurdle, a variety of chemical-based labeling and derivatization<br />

techniques have been developed (5,7,8,9). One of these techniques, isotopecoded<br />

affinity tags (ICATs), has been widely adopted and remains the model<br />

system by which most other differential labeling methods have been developed<br />

(10). The structure of the reagent used in ICAT studies is composed of four<br />

parts: (1) an iodoacetamide group that covalently reacts with cysteine residues<br />

within proteins; (2) an isotope-coded linker regions, which is prepared in two<br />

distinct versions containing either nine 13 C (heavy version) or nine 12 C (light<br />

version); (3) a biotin tag that facilitates the purification of labeled peptides via<br />

its specific binding to avidin; and (4) an acid-labile bond that is situated between<br />

the biotin and isotopically differential domain of the reagent (Fig. 1). After<br />

labeling the cysteine residues, the protein mixture is enzymatically digested<br />

(usually with trypsin) and the labeled peptides purified via avidin chromatography.<br />

Following the enrichment of the ICAT-labeled peptides, the cleavable<br />

linker and the biotin tag are removed using trifluoroacetic acid (TFA). The<br />

removal of the biotin tag reduces the mass of the remaining tag attached to the<br />

peptide and increases the fragmentation efficiency and ultimately the success<br />

rate of peptide identification by tandem MS.<br />

The advantage of ICAT labeling is the identical chemistry, yet differential<br />

mass, of the heavy and light reagents, which enables the protein<br />

abundances within two complex proteome samples to be compared simultaneously.<br />

Following their coelution from a nanoflow reversed-phase liquid<br />

chromatography column, the light- and heavy-labeled peptides are easily recognized<br />

within the mass spectrum, being separated by ∼9 Da. The tandem MS<br />

spectrum enables the peptide to be identified, while the ratio of the areas of<br />

each peak is used as a measurement of the peptide’s relative abundance in<br />

the samples being compared. Since its inception, the ICAT reagents have been<br />

modified, improved, and made available commercially via applied biosystems<br />

Fig. 1. The structure of cleavable isotope-coded affinity tag reagent.


Isotope-Coded Affinity Tag Labeling 183<br />

as a kit (11). The combination of ICAT labeling, peptide fractionation, and<br />

the liquid chromatography tandem mass spectrometry has enabled the rapid<br />

and simultaneous identification and quantitation of changes in complex protein<br />

mixtures (12,13,14,15,16).<br />

In this chapter, the ICAT labeling procedure is described as part of an experiment<br />

to identify celecoxib-induced proteomic changes in colon cancer cells.<br />

Celecoxib is a nonsteroidal anti-inflammatory drug that specifically inhibits<br />

cyclooxygenase-2 (COX-2) (17,18). In clinical trials, it has been shown to<br />

inhibit the development of precancerous polyposis in colon (19,20). In this<br />

study, a COX-2 expressing colon cancer cell line (HT-29) is used (21,22).<br />

After treating the cells with celecoxib, cell lysate would be prepared and<br />

labeled with the ICAT reagents. A schematic diagram of the ICAT labeling<br />

and peptide analysis procedure is shown in Fig. 2. Since the core of the ICATbased<br />

quantitative proteomic analysis is sample preparation, this chapter is<br />

dedicated to the details of the ICAT labeling protocol itself. For information<br />

on strong cation exchange (SCX) high performance liquid chromatography<br />

(HPLC) separation of peptides, analysis by nanoflow reversed-phase liquid<br />

chromatography tandem mass spectrometry, and bioinformatics analysis, refer<br />

to the chapter on “Analysis of the Extracellular Matrix and Secreted Vesicle<br />

Proteomes by Mass Spectrometry,” (Subheadings 3.6–3.8). The methods<br />

described in this chapter can be used to (1) understand the proteomic changes<br />

in response to drug; (2) illustrate the molecular mechanisms underlying the<br />

drug effects; and (3) search for biomarkers or endpoints that can be used to<br />

monitor and evaluate the therapeutic and intervention approaches.<br />

2. Materials<br />

2.1. Cell Culture and Harvest<br />

1. T-75 cell culture flasks<br />

2. McCoy’s 5a medium supplemented with 10% (v/v) fetal bovine serum, 50 U/mL<br />

penicillin, 50 μg/mL streptomycin, and 1.5 mM l-glutamine (American Type<br />

Culture Collection (ATCC), Manassas, VA)<br />

3. Dimethylsulfoxide (DMSO, cell culture use)<br />

4. HT-29 cell line (ATCC, Manassas, VA)<br />

5. Celecoxib (Pfizer, New York, NY)<br />

6. 75 μM celecoxib: dissolve celecoxib in DMSO to make a 100 mM stock solution.<br />

Further dilute to 75 μM with McCoy’s 5a cell culture medium. Use the same<br />

concentration of DMSO in medium as negative control<br />

7. Sterile phosphate-buffered saline (PBS) solution<br />

8. 500 mM EDTA, pH 8<br />

9. 2 mM EDTA in sterile PBS: add 80 μL of 500 mM EDTA, pH 8, in 20 mL of PBS<br />

10. Centrifuge (maximum force: ∼17,000×g)


184 Xiao and Veenstra<br />

Fig. 2. Schematic diagram of the ICAT labeling procedure applied to the quantitative<br />

proteomic analysis.<br />

2.2. Cell Lysis, Desalting, and Protein Quantitation<br />

1. Lysis buffer: 50 mM Tris–HCl, pH 7.2, 1% Triton X-100, 10 mM sodium fluoride<br />

(NaF), 1 mM sodium orthovanadate (Na 3 VO 4 ), and 1 mM EDTA<br />

2. Digital sonifier (Model 250, Branson Ultrasonics Corporation, Danbury, CT)<br />

3. Bicinchoninic acid (BCA) protein assay reagent kit (Pierce, Rockford, IL)<br />

4. D-Salt TM excellulose plastic desalting column 5 mL (maximum binding capacity<br />

is 1.25 mg per column) (Pierce, Rockford, IL)<br />

5. 50 mM NH 4 HCO 3 ,pH8.3


Isotope-Coded Affinity Tag Labeling 185<br />

6. Coomassie blue reagent: coomassie plus – The Better Bradford TM assay reagent<br />

(Pierce, Rockford, IL)<br />

7. Centrifuge (maximum force: ∼17,000×g)<br />

8. Vacuum centrifuge<br />

2.3. Denaturing and Reducing the Proteins<br />

1. Denaturing buffer: 6 M guanidine in 50 mM NH 4 HCO 3 ,pH8.3<br />

2. 100 mM Tris (2-carboxyethyl) phosphine (TCEP) (Pierce, Rockford, IL)<br />

3. Boiling water bath<br />

2.4. Labeling with Cleavable ICAT Reagents, Desalting,<br />

and Tryptic Digestion<br />

1. Cleavable ICAT TM reagents (light and heavy sulfhydryl modifying biotinylating<br />

reagents). Store at –20 °C. One unit of either light or heavy reagent labels 100 μg<br />

of protein. The regular kit offers both reagents in 1 unit/tube. The bulk kit offers<br />

both reagents in 10 units/tube. The method described here is based on the use of<br />

a regular kit, i.e., 1 unit that labels 100 μg of protein/tube. (Applied Biosystems,<br />

Foster City, CA)<br />

2. Acetonitrile<br />

3. 37 °C water bath<br />

4. D-Salt TM excellulose plastic desalting column 5 mL (Pierce, Rockford, IL)<br />

5. 50 mM NH 4 HCO 3 ,pH8.3<br />

6. Coomassie blue reagent: coomassie plus – The Better Bradford TM assay reagent<br />

(Pierce, Rockford, IL)<br />

7. Trypsin gold, MS grade (Promega, Madison, WI)<br />

2.5. Purifying the Labeled Peptides<br />

1. Phenylmethanesulfonyl fluoride (PMSF) (Sigma Chemical Co., St. Louis, MO)<br />

2. Glass wool<br />

3. 5–3/4˝ disposable pasteur glass pipettes<br />

4. Ultralink TM immobilized monomeric avidin slurry [50% (v/v)] (Pierce,<br />

Rockford, IL)<br />

5. Teflon tubing that fits the tip of the 5–3/4˝ disposable pasteur glass pipettes<br />

6. 2× PBS buffer, pH 7.2: dissolve 14.2 g of Na 2 HPO 4 and 8.77 g of NaCl in<br />

450 mL of H 2 O. Adjust pH to 7.2 by adding about 350 μL of 85% (v/v) H 3 PO 4 .<br />

Add H 2 O to make a total volume of 500 mL. The final concentration is 200 mM<br />

Na 2 HPO 4 and 300 mM NaCl<br />

7. 1× PBS, pH 7.2: dilute 2× PBS 1:1 in H 2 O<br />

8. 2 mM biotin solution: dissolve 9.8 mg of d-biotin ImmunoPure (MW 244.31,<br />

Pierce, Rockford, IL) in 20 mL of 2× PBS, pH 7.2<br />

9. Acetonitrile [20% (v/v)] in 50 mM NH 4 HCO 3 ,pH8.3<br />

10. Acetonitrile [30% (v/v)] containing 0.4% (v/v) formic acid


186 Xiao and Veenstra<br />

11. pH paper (pH 2–9)<br />

12. Dry ice<br />

2.6. Cleaving Biotin<br />

1. Cleaving reagent A (10 mL) (Applied Biosystems, Foster City, CA): contains<br />

concentrated TFA. Store in fume hood at room temperature<br />

2. Cleaving reagent B (Applied Biosystems, Foster City, CA): store at –20 °C<br />

3. 37 °C water bath<br />

4. Vacuum centrifuge<br />

3. Methods<br />

3.1. Cell Culture and Harvest<br />

1. On day 1, plate HT-29 cells in T-75 flasks at 5 × 10 6 cells/flask.<br />

2. On day 2, aspirate medium. Culture cells with fresh medium containing 75 μM<br />

of celecoxib or DMSO (negative control).<br />

3. On day 3, 24 h after treating cells, aspirate cell culture medium. Rinse cells once<br />

quickly with 6 mL of PBS.<br />

4. Add 3 mL of 2 mM EDTA-PBS per flask, put flask into the 37 °C incubator.<br />

Monitor the detachment of cells carefully. Cells usually detach within 5 min. For<br />

the celecoxib-treated cells, it takes less than 5 min (see Note 1).<br />

5. Tap the side of the flask against the palm of hand to dislodge cells. When the<br />

cells are visibly detached, add 7 mL of PBS to flask. Resuspend cells and transfer<br />

cell suspension to a 15 mL centrifuge tube. Harvest the treated and control cells<br />

in separate tubes.<br />

6. Centrifuge the cell suspension at 500×g for 5 min. Remove the supernatant.<br />

7. Wash cell pellet with 10 mL of PBS three times. Centrifuge at 500×g for 5 min.<br />

Remove PBS after each centrifugation.<br />

8. Cell pellet is ready for lysis. Leave cell pellet on ice before proceeding to the<br />

next step, or store the pellet at –80 °C.<br />

3.2. Cell Lysis, Desalting and Protein Quantitation<br />

1. Add 500 μL of lysis buffer to the cell pellet harvested from each T-75 flask.<br />

Transfer the resuspended cells to a 1.5 mL eppendorf tube. Vortex briefly.<br />

2. Clean the sonifier probe with H 2 O, methanol, and let it air dry before use.<br />

3. To break the cells, set the digital sonifier amplitude at 16%. Hold up the<br />

eppendorf tube with suspended cells. Let the probe plunge half way into the<br />

lysis buffer. Pulse for 10 s, pause for 50 s. Repeat this cycle five times. Rest the<br />

tube on ice between pulses. Lift the tube up again in time before the next 10 s<br />

pulse cycle starts (see Note 2).<br />

4. Clean the sonifier probe as in step 2 before starting the next sample.<br />

5. Centrifuge cell lysate at 15,000×g for 15 min at 4 °C.


Isotope-Coded Affinity Tag Labeling 187<br />

6. Transfer cell lysate to a fresh eppendorf tube (see Note 3).<br />

7. Quantify the protein in cell lysate using the BCA assay (see Note 4).<br />

8. Prepare desalting column (D-Salt TM Excellulose Plastic Desalting Column, 5 mL,<br />

Pierce) by washing column with 5× bed volume (i.e., 25 mL) of 50 mM<br />

NH 4 HCO 3 , pH 8.3 (see Note 5).<br />

9. Based on the BCA assay results, load up to 1.25 mg of cell lysate into each<br />

desalting column. Discard the flow through (see Note 6).<br />

10. Add 0.5 mL of 50 mM NH 4 HCO 3 , pH 8.3 into the column. Collect the flow<br />

through into one eppendorf tube. Repeat this step seven times. Collect eluant in<br />

seven 0.5 mL fractions.<br />

11. Take 10 μL of eluant from each fraction and mix with 300 μL (1:30) of coomassie<br />

blue reagent (Pierce). Visually examine the color of each tube. The color of<br />

the protein-containing fractions should change from brown to blue. Proteins<br />

normally elute in fractions 3–5.<br />

12. Pool the tubes containing protein. Mix well. Discard the tubes that do not contain<br />

protein.<br />

13. Measure the protein concentration using the BCA assay (see Note 4).<br />

14. Based on the BCA assay results, transfer 800 μg of protein from each of the<br />

treated and control samples into two separate eppendorf tubes (see Note 7).<br />

15. Lyophilize these two samples in vacuum centrifuge (see Note 8).<br />

3.3. Denaturing and Reducing the Proteins<br />

1. Freshly prepare denaturing buffer and 100 mM TCEP.<br />

2. Add denaturing buffer and 100 mM TCEP to the protein samples. For 800 μg of<br />

protein, add 640 μL of denaturing buffer and 8 μL of TCEP (see Note 9).<br />

3. Vortex until the sample is completely dissolved in the buffer.<br />

4. Boil the sample for 10 min.<br />

5. Vortex to mix well. Spin the samples in centrifuge briefly. Cool to room<br />

temperature.<br />

3.4. Labeling with Cleavable ICAT Reagents, Desalting,<br />

and Tryptic Digestion<br />

1. Remove the ICAT reagents from the –20 °C freezer. Bring to room temperature.<br />

Avoid exposing them to the light. To label 800 μg of protein (control or treated),<br />

use eight tubes of reagent (light or heavy, label 100 μg of protein/tube). Spin in<br />

centrifuge briefly to bring down the powder from the wall to the bottom of the<br />

tube.<br />

2. In the chemical hood with lights off, add 20 μL of acetonitrile into each of the<br />

eight reagent tubes (light or heavy). Add 80 μL (i.e., 100 μg) of protein sample into<br />

each tube. Tighten the tube caps. Vortex to mix well. Spin briefly in centrifuge<br />

(see Note 10).


188 Xiao and Veenstra<br />

3. Pool the control or treated sample mixtures (eight tubes of light or heavy),<br />

respectively, into two tubes. This pooling should result in one light and one heavy<br />

label tube with 800 μL of protein mixture in each.<br />

4. Incubate the samples in the 37 °C water bath for 2 h. Keep the samples from<br />

being exposed to light.<br />

5. Combine the light- and heavy-labeled samples together into one tube. Proceed<br />

with desalting.<br />

6. Use the same desalting column as in the previous section. Since the binding<br />

capacity per column is 1.25 mg, prepare two columns for a total of 1.6 mg of<br />

labeled protein. Wash each column with 5× bed volume (i.e., 25 mL) of 50 mM<br />

NH 4 HCO 3 , pH 8.3 (see Note 11).<br />

7. Load 800 μg of the combined and labeled proteins per column. Follow steps<br />

8–12 in Subheading 3.2. At the end of elution, pool the protein-containing eluant<br />

fractions (usually fractions 3–5) into one 15 mL tube. (see Note 12).<br />

8. Prepare trypsin freshly by reconstituting 20 μg of trypsin in 20 μL of 50 mM<br />

NH 4 HCO 3 , pH 8.3. Add trypsin to the labeled protein at a trypsin-to-protein ratio<br />

of 1:40 (w/w). For 1.6 mg of protein, add 40 μg of trypsin (see Note 13).<br />

9. Wrap the 15 mL tube with aluminum foil. Incubate at 37 °C overnight (see<br />

Note 14).<br />

3.5. Purifying the Labeled Peptides<br />

1. Boil the peptide solution for 10 min to deactivate trypsin.<br />

2. Freshly prepare 100 mM PMSF in methanol. Vortex to dissolve well.<br />

3. Add PMSF at a 1:100 dilution (v/v) to the trypsin-digested samples. For 3 mL<br />

of digests, add 30 μL of PMSF. The final PMSF concentration is 1 mM. Vortex<br />

briefly to mix.<br />

4. Prepare the avidin column: put a small trace of glass wool gently into a 5–3/4˝<br />

pasteur glass pipette. Push it from the top down for about 4–1/2˝. This packing<br />

creates a support for the resin to settle onto (see Note 15).<br />

5. Add 0.5 mL of water into the pipette. Let the water level fall till it reaches the<br />

glass wool. At this point, the flow should stop naturally. Block the bottom of<br />

the pipette. Then slowly add 1.5 mL of water into the pipette. Mark the water<br />

level as an indicator for the volume of 1.5 mL.<br />

6. Gradually add the avidin slurry to the 1.5 mL mark. Connect Teflon tubing to<br />

the pipette tip to increase the flow rate (see Note 16).<br />

7. Condition the column using the following washing buffers and sequence<br />

(see Note 17)<br />

a. 2× PBS, pH 7.2, 8 mL (5× bed volume)<br />

b. 2 mM biotin solution, 6 mL (4× bed volume)<br />

c. 30% (v/v) acetonitrile, 0.4% (v/v) formic acid, and 6 mL (4× bed volume)<br />

d. 2× PBS, pH 7.2, 8 mL (5× bed volume)<br />

8. Sample loading and incubation: take the teflon tubing off. Load 1.5 mL of the<br />

digest sample into the column. After the sample flows through, incubate at room


Isotope-Coded Affinity Tag Labeling 189<br />

temperature for 15 min. Load another 1.5 mL (or the rest) of sample. Incubate<br />

for 15 min (see Note 18).<br />

9. Connect the teflon tubing back to the tip of the pipette. Wash the column bound<br />

with ICAT-labeled peptides with the following buffers and sequence:<br />

a. 2× PBS, pH 7.2, 8 mL (5× bed volume)<br />

b. 1× PBS, pH 7.2, 8 mL (5× bed volume)<br />

c. 20% (v/v) acetonitrile in 50 mM NH 4 HCO 3 , pH 8.3, 6 mL (4× bed volume)<br />

10. Final wash: take off the teflon tubing. Add 1.3 mL (a volume slightly less than<br />

the bed volume) of 30% (v/v) acetonitrile, 0.4% (v/v) formic acid as a final<br />

wash. Discard the flow through. Measure the pH of the last drop of this wash<br />

step with pH paper. The pH should be >8 (basic), suggesting that acetonitrile<br />

has not eluted the peptides off and that the peptides are still retained on the<br />

beads (see Note 19).<br />

11. Elute the peptides with 4 mL of 30% (v/v) acetonitrile, 0.4% (v/v) formic acid<br />

in one 15 mL tube. Mix well and divide into four 1 mL aliquots. Briefly freeze<br />

the peptides on dry ice or at –80 °C and then lyophilize in vacuum centrifuge<br />

(see Note 20).<br />

3.6. Cleaving Biotin<br />

1. Prepare the cleaving reagent mixture in a chemical hood. For 1.6 mg of labeled<br />

peptides, mix 760 μL of cleaving reagent A with 40 μL of cleaving reagent B. Add<br />

the cleaving reagent mixture to the dry peptides. Dispense the mixture equally to<br />

all four peptide aliquots (see Note 21).<br />

2. Close the tube caps. Vortex well to dissolve the peptides.<br />

3. Incubate the samples in a 37 °C water bath for 2 h.<br />

4. Pool all the aliquots together when the incubation is finished. Freeze briefly on<br />

dry ice or at –80 °C. Lyophilize the peptides in vacuum centrifuge.<br />

5. Store at –80 °C prior to the next step (i.e., fractionation by SCX HPLC).<br />

4. Notes<br />

1. Dislodging cells using a low concentration of EDTA preserves the integrity of<br />

cell surface proteins, which is critical in quantitative proteomic analysis.<br />

2. For the Branson digital sonifier, use the following program settings: pulse on for<br />

10 s; off for 50 s; amplitude = 16%. If bubbles are generated during sonication,<br />

decrease the amplitude setting. Depending on the sample volume, the setting<br />

can sometimes be lowered to 14%. The clumps of cells should disappear when<br />

sonication is complete.<br />

3. After this step the cell lysate can be stored at –80 °C. Otherwise, proceed to the<br />

next step, i.e., BCA assay and desalting.<br />

4. Protein quantitation is a common laboratory procedure. The instructions are<br />

included within the BCA assay kit (Pierce); therefore, the procedure is not<br />

described in this chapter.


190 Xiao and Veenstra<br />

5. It is helpful to assemble a funnel reservoir on the top of the column to hold a<br />

larger volume (up to 25 mL) of buffer.<br />

6. The maximum binding capacity of the desalting column is 1.25 mg of protein<br />

per column.<br />

7. The method described here is based on the labeling of 800 μg of protein from<br />

each of the treated and control samples. This amount of protein is desirable if<br />

enough cell lysate is available. However, as little as 100 μg of protein from each<br />

of the treated and control samples can be labeled using this protocol.<br />

8. It takes about 3htolyophilize the samples. If necessary, leave the samples in<br />

the vacuum, centrifuge overnight to dry.<br />

9. It is important to keep the pH of the cell lysate above 7 (ideally between 8 and<br />

9). A pH below 7 will inhibit the reaction between cysteine residues and the<br />

iodoacetamide group of the ICAT reagents.<br />

10. Usually the control sample is labeled with the light reagent and the treated<br />

sample is labeled with the heavy reagent.<br />

11. To save time, it is suggested to set the two columns up on the stand during the<br />

2-h labeling incubation time. It is better to attach a funnel reservoir to the top<br />

of each column to hold up to 25 mL of wash buffer.<br />

12. Normally the volume of sample after pooling is about 3 mL. Desalted samples<br />

may have an opaque color because of the protein present in the sample.<br />

13. Instead of using the buffer provided by the manufacturer, resuspend trypsin<br />

in 50 mM NH 4 HCO 3 , pH 8.3. Keep the trypsin-to-protein ratio between 1:40<br />

and 1:50.<br />

14. The digestion mixture is incubated overnight for approximately 16–18 h.<br />

15. Make sure the glass wool is well packed. There should be no holes present;<br />

however, it should still allow liquid flow through at a reasonable flow rate.<br />

Check the flow rate by adding 0.5 mL of water into the pipette. The water<br />

should flow through quickly. Note that the flow rate will be slower considerably<br />

once the avidin slurry is packed into the column. Take these recommendations<br />

into consideration and not to pack too much or too little glass<br />

wool.<br />

16. The protein binding capacity of avidin slurry is 1.6 mg protein per milliliter of<br />

packed avidin. One 1.5 mL column should offer sufficient capacity to enrich the<br />

labeled peptides from 1.6 mg of protein.<br />

17. The binding of 2 mM biotin to the column and the elution by 30% (v/v) acetonitrile,<br />

0.4% (v/v) formic acid preclear the column of any potential nonspecific<br />

binding activities.<br />

18. The teflon tubing is a useful tool to adjust the flow rate. Connecting the teflon<br />

tubing on to the tip of the column will increase the flow rate. On the other hand,<br />

the flow rate will be slower without the teflon tubing attached.<br />

19. The final wash is aimed to remove any nonspecific binding proteins. Using a<br />

volume slightly less than the bed volume ensures that the labeled peptides are<br />

retained on the column. The volume of the final wash buffer can be adjusted<br />

according to the actual bed volume. When the bed volume of avidin is smaller,


Isotope-Coded Affinity Tag Labeling 191<br />

the volume of the final wash buffer needs to be scaled down. If the pH of the<br />

last drop is less than 3, the labeled peptides may have started to elute, meaning<br />

potential loss of the labeled peptides.<br />

20. The elution should be performed in a chemical fume hood to avoid inhaling<br />

acetonitrile. The quick freezing of samples on dry ice can prevent sample spill<br />

during vacuum centrifugation and reduce the time needed for the samples to<br />

dry.<br />

21. For every 200 μg of labeled peptides (i.e., 100 μg each of heavy or light labeled<br />

in the pair), mix 95 μL of cleaving reagent A and 5 μL of cleaving reagent B<br />

together first and transfer to the labeled peptides.<br />

Acknowledgments<br />

This project has been funded in whole or in part with Federal funds from<br />

the National Cancer Institute, National Institutes of Health, under Contract No.<br />

N01-CO-12400. The content of this publication does not necessarily reflect<br />

the views or policies of the Department of Health and Human Services, nor<br />

does mention of trade names, commercial products, or organization imply<br />

endorsement by the U.S. Government.<br />

References<br />

1. Aebersold, R., Rist, B. and Gygi, S. P. (2000) Quantitative proteome analysis:<br />

methods and applications. Ann N Y Acad Sci 919, 33–47.<br />

2. Gygi, S. P., Rist, B. and Aebersold, R. (2000) Measuring gene expression by<br />

quantitative proteome analysis. Curr Opin Biotechnol 11, 396–401.<br />

3. Yates, J. R. 3rd. (2004) Mass spectral analysis in proteomics. Annu Rev Biophys<br />

Biomol Struct 33, 297–316.<br />

4. Ong, S. E. and Mann, M. (2005) Mass spectrometry-based proteomics turns quantitative.<br />

Nat Chem Biol 1, 252–262.<br />

5. Zieske, L. R. (2006) A perspective on the use of iTRAQ reagent technology for<br />

protein complex and profiling studies. J Exp Bot 57, 1501–1508.<br />

6. Yan, W. and Chen, S. S. (2005) Mass spectrometry-based quantitative proteomic<br />

profiling. Brief Funct Genomic Proteomic 4, 27–38.<br />

7. Bronstrup, M. (2004) Absolute quantification strategies in proteomics based on<br />

mass spectrometry. Expert Rev Proteomics 1, 503–512.<br />

8. Conrads, T. P., Issaq, H. J. and Hoang, V. M. (2003) Current strategies for quantitative<br />

proteomics. Adv Protein Chem 65, 133–159.<br />

9. Leitner, A. and Lindner, W. (2004) Current chemical tagging strategies for<br />

proteome analysis by mass spectrometry. J Chromatogr B Analyt Technol Biomed<br />

Life Sci 813, 1–26.<br />

10. Gygi, S. P., Rist, B., Gerber, S. A., Turecek, F., Gelb, M. H. and Aebersold, R.<br />

(1999) Quantitative analysis of complex protein mixtures using isotope-coded<br />

affinity tags. Nat Biotechnol 17, 994–999.


192 Xiao and Veenstra<br />

11. Flory, M. R., Griffin, T. J., Martin, D. and Aebersold, R. (2002) Advances in<br />

quantitative proteomics using stable isotope tags. Trends Biotechnol 20, S23–S29.<br />

12. Han, D. K., Eng, J., Zhou, H. and Aebersold, R. (2001) Quantitative profiling of<br />

differentiation-induced microsomal proteins using isotope-coded affinity tags and<br />

mass spectrometry. Nat Biotechnol 19, 946–951.<br />

13. Conrads, K. A., Yu, L. R., Lucas, D. A., Zhou, M., Chan, K. C., Simpson, K. A.,<br />

Schaefer, C. F., Issaq, H. J., Veenstra, T. D., Beck, G. R. Jr. and Conrads, T. P.<br />

(2004) Quantitative proteomic analysis of inorganic phosphate-induced murine<br />

MC3T3-E1 osteoblast cells. Electrophoresis 25, 1342–1352.<br />

14. Gygi, S. P., Rist, B., Griffin, T. J., Eng, J. and Aebersold, R. (2002) Proteome<br />

analysis of low-abundance proteins using multidimensional chromatography and<br />

isotope-coded affinity tags. J Proteome Res 1, 47–54.<br />

15. Tao, W. A. and Aebersold, R. (2003) Advances in quantitative proteomics via<br />

stable isotope tagging and mass spectrometry. Curr Opin Biotechnol 14, 110–118.<br />

16. Conrads, K. A., Yi, M., Simpson, K. A., Lucas, D. A., Camalier, C. E., Yu, L. R.,<br />

Veenstra, T. D., Stephens, R. M., Conrads, T. P. and Beck, G. R. Jr. (2005) A<br />

combined proteome and microarray investigation of inorganic phosphate-induced<br />

pre-osteoblast cells. Mol Cell Proteomics 4, 1284–1296.<br />

17. Koehne, C. H. and Dubois, R. N. (2004) COX-2 inhibition and colorectal cancer.<br />

Semin Oncol 31, 12–21.<br />

18. Sinicrope, F. A. and Gill, S. (2004) Role of cyclooxygenase-2 in colorectal cancer.<br />

Cancer Metastasis Rev 23, 63–75.<br />

19. Steinbach, G., Lynch, P. M., Phillips, R. K., Wallace, M. H., Hawk, E.,<br />

Gordon, G. B., Wakabayashi, N., Saunders, B., Shen, Y., Fujimura, T., Su, L. K.<br />

and Levin, B. (2000) The effect of celecoxib, a cyclooxygenase-2 inhibitor, in<br />

familial adenomatous polyposis. N Engl J Med 342, 1946–1952.<br />

20. Thun, M. J., Henley, S. J. and Patrono, C. (2002) Nonsteroidal anti-inflammatory<br />

drugs as anticancer agents: mechanistic, pharmacologic, and clinical issues. J Natl<br />

Cancer Inst 94, 252–266.<br />

21. Arico, S., Pattingre, S., Bauvy, C., Gane, P., Barbat, A., Codogno, P. and Ogier-<br />

Denis, E. (2002) Celecoxib induces apoptosis by inhibiting 3-phosphoinositidedependent<br />

protein kinase-1 activity in the human colon cancer HT-29 cell line.<br />

J Biol Chem 277, 27613–27621.<br />

22. Lev-Ari, S., Strier, L., Kazanov, D., Madar-Shapiro, L., Dvory-Sobol, H.,<br />

Pinchuk, I., Marian, B., Lichtenberg, D. and Arber, N. (2005) Celecoxib and<br />

curcumin synergistically inhibit the growth of colorectal cancer cells. Clin Cancer<br />

Res 11, 6738–6744.


11<br />

Analysis of Microdissected Cells by Two-Dimensional<br />

LC-MS Approaches<br />

Chen Li, Yi-Hong, Ye-Xiong Tan, Jian-Hua Ai, Hu Zhou, Su-Jun Li,<br />

Lei Zhang, Qi-Chang Xia, Jia-Rui Wu, Hong-Yang Wang, and Rong Zeng<br />

Summary<br />

Laser capture microdissection (LCM) is a powerful tool that enables the isolation of<br />

specific cell types from tissue sections, overcoming the problem of tissue heterogeneity and<br />

contamination. We combined the LCM with isotope-coded affinity tag (ICAT) technology<br />

and two-dimensional liquid chromatography to investigate the qualitative and quantitative<br />

proteomes of hepatocellular carcinoma (HCC). The effects of three different histochemical<br />

stains on tissue sections have been compared, and toluidine blue stain was proved as the<br />

most suitable stain for LCM followed by proteomic analysis. The solubilized proteins<br />

from microdissected HCC and non-HCC hepatocytes were qualitatively and quantitatively<br />

analyzed with two-dimensional liquid chromatography tandem mass spectrometry<br />

(2D-LC-MS/MS) alone or coupled with cleavable isotope-coded affinity tag (cICAT)<br />

labeling technology. A total of 644 proteins were qualitatively identified and 261 proteins<br />

were unambiguously quantified. These results showed that the clinical proteomic method<br />

using LCM coupled with ICAT and 2D-LC-MS/MS can carry out not only large-scale but<br />

also accurate qualitative and quantitative analysis.<br />

Key Words: hepatocellular carcinoma; laser capture microdissection; isotope-coded<br />

affinity tag; two-dimensional liquid chromatography; mass spectrometry.<br />

1. Introduction<br />

Hepatocellular carcinoma (HCC) is one of the most frequent tumors<br />

worldwide. There are 0.25–1 million newly diagnosed cases of HCC each year<br />

(1). The highest frequencies of HCC are observed in sub-Saharan Africa and<br />

From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />

Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />

193


194 Li et al.<br />

in Asia. In China, it has ranked the second cancer killer since 1990s. The most<br />

risky factors of HCC are chronic hepatitis B virus (HBV) and hepatitis C virus<br />

(HCV) infections, chronic exposure to the mycotoxin or aflatoxin B1 (AFB1),<br />

and alcoholic cirrhosis. Till now, the mainstay for the diagnosis for HCC<br />

includes serological tumor markers, such as alpha-fetoprotein, the L3 fraction<br />

of alpha-fetoprotein, and PIVKA-II, as well as imaging modalities (1,2,3).<br />

In order to improve diagnosis and prognosis from HCC, there is an<br />

urgent need to identify molecular markers to detect the disease. Using<br />

tissue samples from patients with HCC may be the most direct and<br />

persuasive way to find useful diagnostic and/or prognostic markers. Recently,<br />

proteomic analysis was applied to HCC tissues. Nineteen cases of HCC were<br />

analyzed by two-dimensional electrophoresis (2DE) and matrix-assisted laser<br />

desorption/ionization time-of-flight mass spectrometry (MALDI-TOF-MS) by<br />

Paik et al. (4,5,6). Proteome alterations in normal, cirrhotic, and tumorous<br />

tissue were observed using 2DE-MALDI-TOF-MS assay by Jung et al. (7).<br />

Kim et al. analyzed 11 cases of HCC using 2DE and delayed extractionmatrix<br />

assisted laser desorption/ionization time-of-flight mass spectrometry<br />

(DE-MALDI-TOF-MS) (8).<br />

Nowadays, non-enzymatic sample preparation (NESP) is one of the regular<br />

techniques for tissue sample preparation, which can be modified based on tissuetype-specific<br />

properties (9). However, problems may be associated with heterogeneity<br />

and contaminating proteins, e.g., blood proteins. Several approaches<br />

have been developed to resolve those problems. The selection of cell types<br />

of interest by dissection has received a great deal of attention. Since 1996,<br />

a laser-assisted technique, laser capture microdissection (LCM), has emerged<br />

as a good choice. LCM under direct microscopic visualization permits rapid<br />

one-step procurement of select cell populations from a section of complex,<br />

heterogeneous tissue (10,11). LCM has been used to isolate specific types<br />

of cells for protein, DNA, and RNA analysis. In the age of proteomics,<br />

proteins obtained by laser capture microdissected cells can be analyzed by twodimensional<br />

gel electrophoresis (2DE gel) (12,13), immunoassay (14,15), and<br />

surface-enhanced laser desorption and ionization time-of-flight (SELDI-TOF)<br />

(16,17,18,19,20,21). The only shortcoming of LCM may be that it requires long<br />

time to pick up sufficient cells for one experiment: 2–7 h for 20,000–40,000<br />

cells per immunoassay and 15 h for 250,000 cells per 2DE gel (22).<br />

Our previous work had applied proteomic analysis to HCC cell lines (23,24)<br />

and HCC metastatic cells (25). Furthermore, we extended our work to clinical<br />

tissues using LCM. However, the present LCM assay only obtains about several<br />

hundred micrograms of proteins with dissection for several hours, which is<br />

hard to be analyzed by traditional 2DE-MS proteomic route, especially for<br />

preparative 2DE gels followed by MS identification.


Proteomic Analysis of Clinical HCC Using LCM 195<br />

Since 1999, the isotope-coded affinity tag (ICAT) strategy has been a leading<br />

technology for relative protein quantification relying on post-harvest stable<br />

isotope labeling (26). Post-harvest labeling with stable isotopes can be used for<br />

protein quantification in cells and tissues from any organism, and the ICAT<br />

method as initially described has been shown to be capable of accurate quantification<br />

of proteins in complex mixtures (26). After the first-generation 2 H-<br />

ICAT reagents, the second- generation cleavable 13 C-ICAT reagents provided<br />

improved performance (27,28,29). The 2D chromatography MS/MS method has<br />

been shown to be capable of identifying a large number of proteins, including<br />

proteins of low abundance (30,31).<br />

In this study, we used LCM to isolate HCC and non-HCC hepatocytes<br />

and firstly combined LCM with cleavable isotope-coded affinity tag (cICAT)<br />

labeling technology and two-dimensional liquid chromatography tandem mass<br />

Frozen sections of HCC tissues<br />

Stained with toluidine blue<br />

Laser capture microdissection<br />

HCC hepatocytes<br />

Non-HCC hepatocytes<br />

Solubilized proteins<br />

Labeled with cICAT light chain<br />

Labeled with cICAT heavy chain<br />

Digestion of protein mixture<br />

2D-LC-MS/MS<br />

Analyze by bioinformatics<br />

Fig. 1. Outline of accurate qualitative and quantitative proteomic analysis of clinical<br />

hepatocellular carcinoma using laser capture microdissection coupled with isotopecoded<br />

affinity tag and two-dimensional liquid chromatography mass spectrometry.<br />

Reprinted with permission from (34).


196 Li et al.<br />

spectrometry (2D-LC-MS/MS) to carry out accurate qualitative and quantitative<br />

analysis of HCC and non-HCC tissues. The flowchart used is outlined in Fig. 1.<br />

Totally 644 proteins in HCC hepatocytes were qualitatively determined and 261<br />

differential proteins between HCC and non-HCC hepatocytes were quantitated.<br />

Till now, this is one of the largest qualitative and qualitative proteomes for<br />

HCC and non-HCC tissues. Our strategy and method provided an accurate,<br />

fast, and sensitive approach for proteomic analysis of clinical tissues, which<br />

will facilitate the understanding of the mechanism of HCC or other diseases<br />

and mining of potential markers and drug targets for diagnosis and treatment.<br />

2. Materials<br />

2.1. Tissue Specimen and Sample Preparation by Nonenzymatic<br />

Method (NESP)<br />

1. Tissues from a HCC patient are isolated from fresh partially hepatectized tissues<br />

of HCCs in Shanghai Eastern Hepatobiliary Surgery Hospital. Access to human<br />

tissues complies with both Chinese laws and the guidelines of the Ethics<br />

Committee.<br />

2. Glutamine-free RPMI 1640 medium: glutamine-free, 5% fetal calf serum, 0.2 mM<br />

phenylmethylsulfonyl fluoride, 1 mM ethylenediaminetetraacetic acid tetrasodium<br />

salt dehydrate (EDTA), and antibiotics: oxacillin 25 μg/ml, gentamycin 50 μg/ml,<br />

penicillin 100 U/ml, streptomycin 100 μg/ml, amphotericin B 0.25 μg/ml, nistatin<br />

50 U/ml. Store at 4°C.<br />

3. Ceramic mortar and pestle (SIBAS Corp. Shanghai, China).<br />

4. Lysis buffer: 8 M urea, 4% 3-[(3-cholamidopropyl)dimethylammonio]-1-propane<br />

sulfonate (CHAPS), 40 mM Tris-HCl (pH 8.3), 65 mM dithiothreitol (DTT).<br />

Store in aliquots at –8°C.<br />

5. Proteinase inhibitor tablet mixture (Roche).<br />

2.2. Laser Capture Microdissection<br />

1. Tissues from a HCC patient are isolated from fresh partially hepatectized tissues<br />

of HCCs in Shanghai Eastern Hepatobiliary Surgery Hospital. Access to human<br />

tissues complies with both Chinese laws and the guidelines of the Ethics<br />

Committee. The tissues are from a 50-year male patient with HCC in Edmondson<br />

grade III (HBV infected, AFP 7.3 μg/L, size 15 × 13 × 10.5 cm).<br />

2. Freezing microtome CM1900 (Leica).<br />

3. O.C.T. compound (Tissue-Tek).<br />

4. Hematoxylin, eosin, and toludine blue stain (Shanghai Genebase Corp.).<br />

5. Leica AS LMD Laser Capture Microdissection System (Leica).<br />

6. Lysis buffer: 8 M urea, 4% CHAPS, 40 mM Tris, 65 mM DTT. Store in aliquots<br />

at –8°C.<br />

7. Proteinase inhibitor tablet mixture (Roche).


Proteomic Analysis of Clinical HCC Using LCM 197<br />

2.3. Removal of Toludine Blue and Digestion of Protein Mixture<br />

for Qualitative Analysis<br />

1. Precipitation solution: 50% acetone, 50% ethanol, 0.1% acetic acid (HAc). Store<br />

at –20°C.<br />

2. Redissolved buffer: 6 M guanidine HCl, 100 mM Tris-HCl (pH 8.3). Store at<br />

4°C.<br />

3. DTT and iodoacetamide (IAA) are from Bio-Rad. Sequencing grade TPCKtrypsin<br />

is from Promega.<br />

4. YM3 ultrafiltration membranes (molecular mass cutoff, 3 kDa) are from Millipore<br />

Corp. All buffers are prepared with Milli-Q water (Millipore).<br />

2.4. Cleavable Isotope-Coded Affinity Tag Labeling of Proteins<br />

1. Tri-n-butylphosphate (TBP) is from Bio-Rad.<br />

2. cICAT light or heavy reagents, Avidin cartridge, affinity buffer–elute, affinity<br />

buffer–load, affinity buffer–wash 1, affinity buffer–wash 2, cleaving reagents A<br />

and B are from Applied Biosystems.<br />

3. Sequencing grade TPCK-trypsin (Promega).<br />

4. YM3 ultrafiltration membranes (molecular mass cutoff, 3 kDa) are from Millipore<br />

Corp. All buffers are prepared with Milli-Q water (Millipore).<br />

2.5. One-Dimensional and Two-Dimensional Liquid Chromatography<br />

Coupled with Tandem Mass Spectrometry<br />

1. Formic acid is obtained from Aldrich, and acetonitrile (HPLC gradient grade) is<br />

obtained from Merck.<br />

2. The LCQ Deca XP system, ProteomeX Workstation and TurboSequest<br />

software are purchased from Thermo Electron Corporation.<br />

2.6. Bioinformatics Analysis<br />

1. ExPASy proteomics tools are accessed from cn.expasy.org/tools/#proteome.<br />

2. Program TMHMM 2.0 is accessed from the Center for Biological Sequence<br />

Analysis (www.cbs.dtu.dk/services/TMHMM/).<br />

3. Classification tools are accessed from www.geneontology.org.<br />

3. Methods<br />

In brief, two keywords should be noticed during the whole process of LCM<br />

coupled with 2D-LC-MS/MS approaches. The first one is speediness, and<br />

the second one is impurity. Sample preparation by LCM technology must be<br />

done as quickly as possible, including fixation of fresh tissues, preparation of<br />

frozen sections, histochemical staining, microdissection, and so on. Impurities,


198 Li et al.<br />

such as histochemical stains, should be removed as completely as possible<br />

by centrifuge, precipitation, and ultrafitration before trypsin digestion and LC-<br />

MS/MS analysis.<br />

Fixation and histochemical staining are the two initial steps in LCM<br />

technology. The appropriate selection of fixation and histochemical staining<br />

methods is an important factor for the processes. In this work, we used freshly<br />

prepared liver tissues to make frozen sections (8 μm thick), and we fixed the<br />

sections with ethanol to avoid the effects on proteins, such as crosslinking<br />

caused by formalin fixation. Some histochemical stains (hematoxylin, eosin,<br />

methyl green, and toluidine blue) were tested in 2DE gel (33), which showed<br />

that staining with single stain (hematoxylin) was better than with two stains<br />

simultaneously (hematoxylin and eosin); methyl green and toluidine blue<br />

staining were both compatible with the analysis of proteins by 2D-PAGE. The<br />

results with toluidine blue staining indicated a direct link between the intensity<br />

of tissue section staining and problems with the generation of good-quality<br />

protein separations. In our study, the proteins from cells after LCM were<br />

subjected to tryptic digestion and LC-MS/MS analysis. The staining material<br />

might affect the pH of digestion buffer or inactivate the trypsin; therefore,<br />

we tried to remove the stains using precipitation and ultrafiltration prior to<br />

digestion. We used three histochemical stains (hematoxylin, eosin, and toluidine<br />

blue), respectively, to stain the frozen sections. Among these three histochemical<br />

stains, we found that almost all toluidine blue stain could be removed<br />

after precipitation in the solution (50% acetone, 50% ethanol, 0.1% acetic<br />

acid) and desalting by ultrafiltration. In addition, protein solubilization stained<br />

by toluidine blue stain was better because some colored protein precipitation<br />

appeared on the filtration membrane when using hematoxylin stain or eosin<br />

stain. Therefore, we chose toluidine blue stain to optimize the experimental<br />

conditions, including staining, microdissection, and protein digestion.<br />

3.1. Tissue Specimen and Sample Preparation by Nonenzymatic<br />

Method (NESP)<br />

1. The tissues used were from a 50-year male patient with HCC in Edmondson<br />

grade III (HBV infected, AFP 7.3 μg/L, size 15 × 13 × 10.5 cm). Tumorous<br />

tissues and their adjacent paired nontumorous tissues (3 cm away from the edge of<br />

HCC lesions, about 0.1 g) were isolated from fresh partially hepatectized tissues<br />

of HBV-associated HCC. A part of the resected tissue was used for histology<br />

analysis.<br />

2. The tissues were rinsed several times with cold glutamine-free RPMI 1640<br />

medium and were homogenized in liquid nitrogen-cooled mortar and pestle (see<br />

Note 1).<br />

3. The tissue powders obtained were dissolved in lysis buffer (see Note 2).


Proteomic Analysis of Clinical HCC Using LCM 199<br />

4. The samples were sonicated on ice for 30 s (intensity: below 50 W) using an<br />

ultrasonic processor and centrifuged for 1hat20,627×g to remove DNA, RNA,<br />

and any particulate materials.<br />

5. The protein concentrations of samples were measured by Bio-Rad Protein Assay<br />

kit. All samples were stored at –8°C until use (see Note 3).<br />

3.2. Laser Capture Microdissection<br />

1. Embed fresh tissues carefully in OCT in plastic mold, taking care not to trap air<br />

bubbles surrounding the tissue. Freeze the tissue by setting mold on top of liquid<br />

nitrogen until 70–80% of the block turns white and then put the block on top of<br />

dry ice.<br />

2. For cutting step, mount the frozen block on the cryostat holder. Never, at any<br />

point, let the tissue warm up to temperatures above –15°c. Allow frozen blocks<br />

to equilibrate in the cryostat chamber for about 5 min. Cut 8-μm sections.<br />

3. Wash 8-μm sections of freshly prepared liver tissues by cold phosphate buffered<br />

saline (PBS, pH 7.4), and stain with toluidine blue using standard manufacturer’s<br />

protocols with minor modifications (see Note 4).<br />

4. Fix the sections in cold 95% ethanol for 10 min, air-dry and microdissect with<br />

Leica AS LMD Laser Capture Microdissection System.<br />

5. Using laser pulses of 7.5 μm diameter, 70 mW, and with 2–3 ms duration,<br />

microdissect approximately 50,000 or 100,000 cells of HCC and non-HCC hepatocytes;<br />

store in microdissection caps at –8°C until lysed (see Note 5). An example<br />

of the results produced using hematoxylin and eosin (H&E) stained section is<br />

shown in Fig. 2.<br />

6. Each cell population was determined to be 95% homogeneous by microscopic<br />

visualization of the captured cells. Dissolve the laser capture microdissected HCC<br />

and non-HCC hepatocytes in lysis buffer (see Note 2).<br />

7. Sonicate the samples on ice for a while using an ultrasonic processor and<br />

centrifuge for 1 h at 20,627×g to remove DNA, RNA, and any particulate<br />

materials.<br />

8. Measure the protein concentrations of samples by Bio-Rad Protein Assay kit.<br />

Store all the samples at –8°C until use (see Note 3).<br />

3.3. Removal of Toludine Blue and Digestion of Protein Mixture<br />

for Qualitative Analysis<br />

1. Deposit the samples prepared by NESP or LCM technology in precipitation<br />

solution (50% acetone, 50% ethanol, 0.1% acetic acid; sample<br />

volume:precipitation solution volume = 1:5) at least for 12 h at –20°C. Wash the<br />

pellets with 100% acetone, 70% ethanol, and lyophilize by lyophilization (see<br />

Note 6).<br />

2. Redissolve the pellets in 6 M guanidine HCl, 100 mM Tris (pH 8.3); measure the<br />

concentrations with Bio-Rad Protein Assay kit.


200 Li et al.<br />

A.<br />

B.<br />

Fig. 2. HCC tissues before (A) and after (B) LCM. Reprinted with permission<br />

from (34).<br />

4. Reduce 200 μg solubilized proteins with DTT (final concentration 20 mM) and<br />

subsequently alkylate with IAA (final concentration 40 mM).<br />

5. After desalting by YM3 ultrafiltration membranes, incubate the protein mixture<br />

with trypsin (trypsin:protein mixture = 1:30, W/W, Promega, Madison, WI) at<br />

37°C for 16 h (see Note 7).<br />

3.4. Cleavable Isotope-Coded Affinity Tag Labeling of Proteins<br />

1. Reduce 100 μg HCC and 100 μg non-HCC solubilized proteins prepared by LCM<br />

technology with TBP (final concentration 5 mM) (see Note 8).


Proteomic Analysis of Clinical HCC Using LCM 201<br />

2. Transfer the reduced HCC and non-HCC solubilized proteins into the vial<br />

containing cICAT light or heavy reagent, respectively, and mix. After a brief<br />

centrifugation, incubate the proteins for 2hat37°C in the dark.<br />

3. Combine the labeled proteins into one tube. After desalting by YM3 ultrafiltration<br />

membranes, incubate the protein mixture with trypsin (trypsin:protein<br />

mixture = 1:30, W/W, Promega, Madison, WI) at 37°C for 16 h (see Note 7).<br />

4. Use Avidin cartridge (Applied Biosystems) to purify the ICAT-labeled peptides<br />

from tryptic digests according to the manufacture’s protocol. In brief, activate<br />

Avidin cartridge by 2 ml of the affinity buffer–elute and 2 ml of the affinity<br />

buffer–load. Slowly inject (∼1 drop/5 s) the peptide sample onto Avidin cartridge.<br />

Wash the Avidin cartridge by 500 μl of affinity buffer–load, 1 ml of affinity<br />

buffer–wash 1, 1 ml of affinity buffer–wash 2, and 1 ml of Milli-Q water. To<br />

elute the labeled peptides, slowly inject (∼1 drop/5 s) the affinity buffer–elute and<br />

collect the elute. Dry the elute from the Avidin cartridge through lyophilization.<br />

5. Dissolve the dried cICAT-labeled peptides in cleaving reagents and cleave for<br />

2 h at 37°C. Condense the cICAT-labeled peptides through lyophilization.<br />

3.5. One-Dimensional and Two-Dimensional Liquid Chromatography<br />

Coupled with Tandem Mass Spectrometry (1D- and 2D-LC-MS/MS)<br />

1. All the 2D HPLC separations are performed on ProteomeX (Thermo Finnigan<br />

Corp., San Jose, CA) equipped with two LC pumps. The flow rates of both salt and<br />

analytical pumps are 200 μl/min and about 2 μl/min after split. The strong cation<br />

exchange column is the 300 μm inner diameter ones (SCX resin, 5 μm), and the<br />

RPC column is the 150 μm inner diameter (C 18 resin, 300 A, 5 μm) (see Note 9).<br />

2. Nine different salt concentration ranges—0, 25, 50, 75, 100, 150, 200, 400, and<br />

800 mM ammonium chloride—are used for step gradient.<br />

3. The mobile phases used for reverse phase are A: 0.1% formic acid in water, pH<br />

3.0, B: 0.1% formic acid in acetonitrile.<br />

4. Load about 200 μg of peptides digested from the LCM protein to the SCX<br />

column by the autosample. The elute condition is described in step 2. Load<br />

the eluted peptides from each salt step to the RPC columns. The RPC columns<br />

are washed by 95% A mobile phases in 20 column volumes. Finally, separate<br />

the peptides using 100-min linear gradient from 5 to 80% B mobile phases.<br />

The eluting peptide enters an LCQ ProteomeX mass spectrometer (Thermo<br />

Electron, San Jose, CA) by the metal needle (see Note 10).<br />

5. The 1D HPLC separation uses the same system/experimental steps, but without<br />

the use of a strong cation exchange column.<br />

6. An electrospray (ESI) ion-trap mass spectrometer (LCQ Deca XP, Thermo<br />

Finnigan, San Jose, CA) is used for peptide detection.<br />

7. The positive ion mode is employed and the spray voltage is set at 3.2 kV. The<br />

spray temperature is set at 150°C for peptides.<br />

8. The collision energy is automatically set by LCQ Deca XP. After the acquisition<br />

of full scan mass spectra, three MS/MS scans are acquired for the next three<br />

most intense ions using dynamic exclusion.


202 Li et al.<br />

9. Peptides and proteins are identified using TurboSequest R (Thermo Finnigan,<br />

San Jose, CA), which uses the MS and MS/MS spectrum of peptide ions<br />

to search against the publicly available NCBI non-redundant protein database<br />

(www.ncbi.nlm.nih.gov).<br />

10. The protein identification criteria that we used are based on Delta CN (≥0.1)<br />

and Xcorr (one charge ≥ 1.8, two charges ≥ 2.2, three charges ≥ 3.7). An<br />

example of the results produced is shown in Table 1 (see Note 11).<br />

11. For quantitative analysis with cICAT technology and 2D-LC-MS/MS, manual<br />

check is followed after database searching and quantification by Xpress<br />

(TurboSequest R software). Quantitative analysis results of 261 proteins from<br />

LCM-ICAT-2D-LC-MS/MS are shown in Fig. 3. In our experiment, a total of<br />

149 differentially expressed proteins with at least twofold quantitative alterations<br />

in HCC and non-HCC hepatocytes were detected, including 55 upregulated<br />

proteins (32 with 2∼5 folds, 13 with 5∼10 folds, 10 with >10 folds) and 94<br />

downregulated spots in HCC hepatocytes (62 with 2∼5 folds, 17 with 5∼10<br />

folds, 15 with >10 folds).<br />

3.6. Bioinformatics Analysis<br />

1. The pI and Mr of the proteins are analyzed using ExPASy proteomics tools<br />

accessed from http://cn.expasy.org/tools/#proteome. Examples of the results<br />

produced are shown in Table 1 and Fig. 5A and 5B.<br />

17<br />

15<br />

32<br />

13<br />

2 ≤ Ratio(HCC/non-HCC) ≤ 5<br />

10<br />

5 < Ratio(HCC/non-HCC) ≤ 10<br />

Ratio(HCC/non-HCC) > 10<br />

62<br />

Ratio(HCC/non-HCC or non-HCC/HCC) < 2<br />

2 ≤ Ratio(non-HCC/HCC) ≤ 5<br />

5 < Ratio(non-HCC/HCC) ≤ 10<br />

Ratio(non-HCC/HCC) > 10<br />

112<br />

Fig. 3. Quantitative analysis results of 261 proteins from LCM-ICAT-2D-LC-<br />

MS/MS. A total of 149 differentially expressed proteins with at least twofold quantitative<br />

alterations in HCC and non-HCC hepatocytes were detected, including 55 upregulated<br />

proteins (32 with 2∼5 folds, 13 with 5∼10 folds, 10 with >10 folds) and 94<br />

downregulated spots in HCC hepatocytes (62 with 2∼5 folds, 17 with 5∼10 folds, 15<br />

with >10 folds). Reprinted with permission from (34).


Proteomic Analysis of Clinical HCC Using LCM 203<br />

Table 1<br />

Summary of Total Proteins Identified in HCC-NESP-1D-LC-MS/MS,<br />

HCC-NESP-2D-LC-MS/MS and HCC-LCM-2D-LC-MS/MS<br />

HCC-<br />

NESP-1D-<br />

LC-MS/MS<br />

HCC-<br />

NESP-2D-<br />

LC-MS/MS<br />

HCC-<br />

LCM-2D-<br />

LC-MS/MS<br />

Protein quantity 200μg 200μg 200μg<br />

Total proteins identified 208 626 644<br />

Hydrophobic proteins 25(12.0%) 64(10.2%) 80(12.4%)<br />

Trans-membrane proteins 8(3.9%) 30(4.8%) 54(8.4%)<br />

Proteins with Mr >100KD or < 10KD 19(9.1%) 77(12.3%) 75(11.6%)<br />

Proteins pI >9 21(10.1%) 78(12.5%) 126(19.6%)<br />

2. The general average hydropathicity (GRAVY) score is calculated as the arithmetic<br />

mean of the sum of the hydropathic indices of each amino acid (32). Examples<br />

of the results produced are shown in Table 1 and Fig. 5C.<br />

3. The trans-membrane prediction is conducted using the computer server<br />

program TMHMM server 2.0, which can be accessed from the CBS<br />

(http://www.cbs.dtu.dk/services/TMHMM/). Examples of the results produced are<br />

shown in Table 1 and Fig. 5D.<br />

4. All identified proteins are classified by their molecular function, cellular<br />

component, and biological process with the tools on http://www.geneontology.org.<br />

An example of the results produced is shown in Fig. 4.<br />

4. Notes<br />

1. Glutamine-free RPMI 1640 medium must be cold (4°C) before use. Washing<br />

should be done as quickly as possible, until there are no contaminations (blood,<br />

etc.) on tissues. Glutamine-free RPMI 1640 medium could be replaced by PBS<br />

(pH 7.4), 0.9% NaCl solution, or any other isotonic buffer.<br />

2. Store the lysis buffer in small aliquots at –8°C to avoid multiple freeze-thaw<br />

cycles. Protease inhibitor tablet mixture (Roche Molecular Biochemicals) should<br />

be dissolved in lysis buffer.<br />

3. Store the samples in small aliquots at –8°C to avoid multiple freeze-thaw cycles.<br />

Protein concentrations of the samples should be about 10 μg/μl for subsequent<br />

experiments.<br />

4. The sections should be very lightly stained with toluidine blue only to distinguish<br />

hepatocytes during microdissection. Otherwise, the redundant stains could affect<br />

follow-up experiments.<br />

5. In fact, in order to reduce microdissection time, manipulators could choose to<br />

capture hepatocytes or remove other cells based on the condition of each section.


204 Li et al.<br />

A.<br />

B.<br />

Fig. 4. Classification of differentially expressed proteins obtained by LCM-ICAT-<br />

2D-LC-MS/MS. (A) shows proteins with at least twofold increased expression levels<br />

in HCC hepatocytes. (B) shows proteins with at least twofold decreased expression<br />

levels in HCC hepatocytes. Reprinted with permission from (34).<br />

6. Precipitation solution, acetone, and ethanol must be cold at –20°C before use.<br />

7. Ultrafiltration is very important to remove redundant salts, stain, and other<br />

impurities, and ensure follow-up steps.<br />

8. TBP is a much stronger but more toxic reducing agent for labeling ICAT reaction<br />

than DTT.


Proteomic Analysis of Clinical HCC Using LCM 205<br />

Protein number<br />

Protein number<br />

100<br />

80<br />

60<br />

40<br />

20<br />

0<br />

45<br />

40<br />

35<br />

30<br />

25<br />

20<br />

15<br />

10<br />

5<br />

0<br />

7<br />

62<br />

100 kDa<br />

C. Hydrophile and hydrophobicity distribution<br />

9<br />

37 39<br />

31 30 27<br />

18<br />

15 11 13 12<br />

4 3<br />

0.3<br />

Protein number<br />

Protein number<br />

70<br />

60<br />

50<br />

40<br />

30<br />

20<br />

10<br />

0<br />

3<br />

Number of trans-membrane region<br />

1<br />

21<br />

61<br />

5<br />

37<br />

10<br />

(5~6)<br />

(6~7)<br />

(7~8)<br />

(8~9)<br />

(9~10)<br />

>10<br />

Fig. 5. Characteristics of differentially expressed proteins obtained by LCM-ICAT-<br />

2D-LC-MS/MS. (A) shows the Mr distribution; (B) shows the pI distribution; (C)<br />

presents the hydrophile and hydrophobicity distribution; and (D) shows the transmembrane<br />

proteins. Reprinted with permission from (34).<br />

9. The LCQ ProteomeX Workstation (Thermo Electron, San Jose, CA) is an<br />

automatic 2D LC/MS system, which can be used in high-throughout proteomic<br />

research. However, you may use another equipment to separate the proteomics<br />

sample by offline SCX fractionation. The step involved in offline SCX fractionation<br />

is almost the same as online. The difference is that you need to manually<br />

load the step salt-eluted peptides to RPC column.<br />

10. If you use the nanospay kit in the mass spectrometer and the 75-μm<br />

inner diameter RPC column, the eluted peptides can directly enter the mass<br />

spectrometer. The sensitivity in the nanospay mode is higher than in the metal<br />

needle mode.<br />

11. The protein identification criteria can vary based on the type of mass<br />

spectrometer or other analytic needs. For example, we use Delta CN (≥0.1) and<br />

Xcorr (one charge ≥ 1.9, two charges≥ 2.2, three charges ≥ 3.75) as criteria<br />

when using LTQ linear ion trap mass spectrometer (Thermo Finnigan, San Jose,<br />

CA).<br />

Acknowledgments<br />

This work was supported by National High-Technology Project<br />

(2001AA233031, 2002BA711A11) and Basic Research Foundation<br />

(2001CB210501).


206 Li et al.<br />

References<br />

1. Feitelson M.A., Sun B., Satiroglu Tufan N.L., Liu J., Pan J. and Lian Z. (2002)<br />

Genetic mechanisms of hepatocarcinogenesis. Oncogene 21, 2593–2604.<br />

2. Fujiyama S., Tanaka M., Maeda S., Ashihara H., Hirata R. and Tomita K. (2002)<br />

Tumor markers in early diagnosis, follow-up and management of patients with<br />

hepatocellular carcinoma. Oncology 62(Suppl 1), 57–63.<br />

3. Qin L.X. and Tang Z.Y. (2002) The prognostic molecular markers in hepatocellular<br />

carcinoma. World J Gastroenterol 8, 385–392.<br />

4. Park K.S., Cho S.Y., Kim H. and Paik Y.K. (2002) Proteomic alterations of the<br />

variants of human aldehyde dehydrogenase isozymes correlate with hepatocellular<br />

carcinoma. Int J Cancer 97, 261–265.<br />

5. Park K.S., Kim H., Kim N.G., Cho S.Y., Choi K.H., Seong J.K. and Paik Y.K.<br />

(2002) Proteomic analysis and molecular characterization of tissue ferritin light<br />

chain in hepatocellular carcinoma. Hepatology 35, 1459–1466.<br />

6. Cho S.Y., Park K.S., Shim J.E., Kwon M.S., Joo K.H., Lee W.S., Chang J.,<br />

Kim H., Chung H.C., Kim H.O. and Paik Y.K. (2002) An integrated proteome<br />

database for two-dimensional electrophoresis data analysis and laboratory information<br />

management system. Proteomics 2, 1104–1113.<br />

7. Lim S.O., Park S.J., Kim W., Park S.G., Kim H.J., Kim Y.I., Sohn T.S., Noh J.H.<br />

and Jung G. (2002) Proteome analysis of hepatocellular carcinoma. Biochem<br />

Biophys Res Commun 291, 1031–1037.<br />

8. Kim J., Kim S.H., Lee S.U., Ha G.H., Kang D.G., Ha N.Y., Ahn J.S., Cho<br />

H.Y., Kang S.J., Lee Y.J., Hong S.C., Ha W.S., Bae J.M., Lee C.W. and<br />

Kim J.W. (2002) Proteome analysis of human liver tumor tissue by twodimensional<br />

gel electrophoresis and matrix assisted laser desorption/ionizationmass<br />

spectrometry for identification of disease-related proteins. Electrophoresis 23,<br />

4142–4156.<br />

9. Franzen B., Hirano T., Okuzawa K., Uryu K., Alaiya A.A., Linder S. and<br />

Auer G. (1995) Sample preparation of human tumors prior to two-dimensional<br />

electrophoresis of proteins. Electrophoresis 16, 1087–1089.<br />

10. Emmert-Buck M.R., Bonner R.F., Smith P.D., Chuaqui R.F., Zhuang Z.,<br />

Goldstein S.R., Weiss R.A. and Liotta L.A. (1996) Laser capture microdissection.<br />

Science 274, 998–1001.<br />

11. Bonner R.F., Emmert-Buck M., Cole K., Pohida T., Chuaqui R., Goldstein S. and<br />

Liotta L.A. (1997) Laser capture microdissection: molecular analysis of tissue.<br />

Science 278, 1481–1483.<br />

12. Ornstein D.K., Gillespie J.W., Paweletz C.P., Duray P.H., Herring J., Vocke<br />

C.D., Topalian S.L., Bostwick D.G., Linehan W.M., Petricoin E.F., III and<br />

Emmert-Buck M.R. (2000) Proteomic analysis of laser capture microdissected<br />

human prostate cancer and in vitro prostate cell lines. Electrophoresis 21,<br />

2235–2242.<br />

13. Jones M.B., Krutzsch H., Shu H., Zhao Y., Liotta L.A., Kohn E.C. and<br />

Petricoin E.F., III (2002) Proteomic analysis and identification of new biomarkers<br />

and therapeutic targets for invasive ovarian cancer. Proteomics 2, 76–84.


Proteomic Analysis of Clinical HCC Using LCM 207<br />

14. Simone N.L., Remaley A.T., Charboneau L., Petricoin E.F., III, Glickman J.W.,<br />

Emmert-Buck M.R., Fleisher T.A. and Liotta L.A. (2000) Sensitive immunoassay<br />

of tissue cell proteins procured by laser capture microdissection. Am J Pathol 156,<br />

445–452.<br />

15. Ornstein D.K., Englert C., Gillespie J.W., Paweletz C.P., Linehan W.M., Emmert-<br />

Buck M.R. and Petricoin E.F., III (2000) Characterization of intracellular prostatespecific<br />

antigen from laser capture microdissected benign and malignant prostatic<br />

epithelium. Clin Cancer Res 6, 353–356.<br />

16. Sauter E.R., Zhu W., Fan X.J., Wassell R.P., Chervoneva I. and Du Bois G.C.<br />

(2002) Proteomic analysis of nipple aspirate fluid to detect biologic markers of<br />

breast cancer. Br J Cancer 86, 1440–1443.<br />

17. Verma M., Wright G.L., Jr., Hanash S.M., Gopal-Srivastava R. and Srivastava<br />

S. (2001) Proteomic approaches within the NCI early detection research network<br />

for the discovery and identification of cancer biomarkers. Ann N Y Acad Sci 945,<br />

103–115.<br />

18. Jain K.K. (2002) Recent advances in oncoproteomics. Curr Opin Mol Ther 4,<br />

203–209.<br />

19. Jr G.W., Cazares L.H., Leung S.M., Nasim S., Adam B.L., Yip T.T., Schellhammer<br />

P.F., Gong L. and Vlahou A. (1999) ProteinChip R surface enhanced laser<br />

desorption/ionization (SELDI) mass spectrometry: a novel protein biochip<br />

technology for detection of prostate cancer biomarkers in complex protein mixtures.<br />

Prostate Cancer Prostatic Dis 2, 264–276.<br />

20. Batorfi J., Ye B., Mok S.C., Cseh I., Berkowitz R.S. and Fulop V. (2003) Protein<br />

profiling of complete mole and normal placenta using ProteinChip analysis on<br />

laser capture microdissected cells. Gynecol Oncol 88, 424–428.<br />

21. Wulfkuhle J.D., Paweletz C.P., Steeg P.S., Petricoin E.F., III and Liotta L. (2003)<br />

Proteomic approaches to the diagnosis, treatment, and monitoring of cancer. Adv<br />

Exp Med Biol 532, 59–68.<br />

22. Seow T.K., Liang R.C., Leow C.K. and Chung M.C. (2001) Hepatocellular<br />

carcinoma: from bedside to proteomics. Proteomics 1, 1249–1263.<br />

23. Yu L.R., Shao X.X., Jiang W.L., Xu D., Chang Y.C., Xu Y.H. and Xia Q.C. (2001)<br />

Proteome alterations in human hepatoma cells transfected with antisense epidermal<br />

growth factor receptor sequence. Electrophoresis 22, 3001–3008.<br />

24. Yu L.R., Zeng R., Shao X.X., Wang N., Xu Y.H. and Xia Q.C. (2000) Identification<br />

of differentially expressed proteins between human hepatoma and normal liver cell<br />

lines by two-dimensional electrophoresis and liquid chromatography-ion trap mass<br />

spectrometry. Electrophoresis 21, 3058–3068.<br />

25. Ding S.J., Li Y., Tan Y.X., Jiang M.R., Tian B., Liu Y.K., Shao X.X., Ye S.L.,<br />

Wu J.R., Zeng R., Wang H.Y., Tang Z.Y. and Xia Q.C. (2004) From proteomic<br />

analysis to clinical significance: overexpression of cytokeratin 19 correlates with<br />

hepatocellular carcinoma metastasis. Mol Cell Proteomics 3(1), 73–81.<br />

26. Gygi S.P., Rist B., Gerber S.A., Turecek F., Gelb M.H. and Aebersold R. (1999)<br />

Quantitative analysis of complex protein mixtures using isotope-coded affinity<br />

tags. Nat Biotechnol 17, 994–999.


208 Li et al.<br />

27. Li J., Steen H. and Gygi S.P. (2003) Protein profiling with cleavable isotope<br />

coded affinity tag (cICAT) reagents: the yeast salinity stress response. Mol Cell<br />

Proteomics 2 (11), 1198–204.<br />

28. Oda Y., Owa T., Sato T., Boucher B., Daniels S., Yamanaka H., Shinohara Y.,<br />

Yokoi A., Kuromitsu J. and Nagasu T. (2003) Quantitative chemical proteomics<br />

for identifying candidate drug targets. Anal Chem 75, 2159–2165.<br />

29. Hansen K.C., Schmitt-Ulms G., Chalkley R.J., Hirsch J., Baldwin M.A. and<br />

Burlingame A.L. (2003) Mass spectrometric analysis of protein mixtures at<br />

low levels using cleavable 13C-isotope-coded affinity tag and multidimensional<br />

chromatography. Mol Cell Proteomics 2, 299–314.<br />

30. Washburn M.P., Wolters D. and Yates J.R., III (2001) Large-scale analysis of<br />

the yeast proteome by multidimensional protein identification technology. Nat<br />

Biotechnol 19, 242–247.<br />

31. Gygi S.P., Corthals G.L., Zhang Y., Rochon Y. and Aebersold R. (2000) Evaluation<br />

of two-dimensional gel electrophoresis-based proteome analysis technology. Proc<br />

Natl Acad Sci USA 97, 9390–9395.<br />

32. Kyte J. and Doolittle R.F. (1982) A simple method for displaying the hydropathic<br />

character of a protein. J Mol Biol 157, 105–132.<br />

33. Craven R.A., Totty N., Harnden P., Selby P.J. and Banks R.E. (2002) Laser<br />

capture microdissection and two-dimensional polyacrylamide gel electrophoresis:<br />

evaluation of tissue preparation and sample limitations. Am J Pathol 160, 815–822.<br />

34. Li C., Hong Y., Tan Y.X., Zhou H., Ai J.H., Li S.J., Zhang L., Xia Q.C., Wu J.R.,<br />

Wang Y. and Zeng R. (2004) Accurate qualitative and quantitative proteomic<br />

analysis of clinical hepatocellular carcinoma using laser capture microdissection<br />

coupled with isotope-coded affinity tag and two-dimensional liquid chromatography<br />

mass spectrometry. Mol Cell Proteomics 3(4), 399–409.


12<br />

Label-Free LC-MS Method for the Identification<br />

of Biomarkers<br />

Richard E. Higgs, Michael D. Knierman, Valentina Gelfanova,<br />

Jon P. Butler, and John E. Hale<br />

Summary<br />

Pharmaceutical companies and regulatory agencies are pursuing biomarkers as a means<br />

to increase the productivity of drug development. Quantifying differential levels of proteins<br />

from complex biological samples like plasma or cerebrospinal fluid is one specific<br />

approach being used to identify markers of drug action, efficacy, toxicity, etc. Academic<br />

investigators are also interested in markers that are diagnostic or prognostic of disease<br />

states. We report a comprehensive, fully automated, and label-free approach to relative<br />

protein quantification including: sample preparation, proteolytic protein digestion, LC-<br />

MS/MS data acquisition, de-noising, mass and charge state estimation, chromatographic<br />

alignment, and peptide quantification via integration of extracted ion chromatograms.<br />

Additionally, we describe methods for transformation and normalization of the quantitative<br />

peptide levels in multiplexed measurements to improve precision for statistical analysis.<br />

Lastly, we outline how the described methods can be used to design and power biomarker<br />

discovery studies.<br />

Key Words: relative quantification; label-free quantification; biomarkers;<br />

proteomics; LC-MS/MS.<br />

1. Introduction<br />

Recent advances in analytical technology, particularly mass spectrometry,<br />

are finding broad applications in the search for biomarkers. Biomarkers may<br />

be defined as indicators of biological processes and encompass a variety of<br />

measures including imaging, polynucleotides, proteins, and small molecule<br />

From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />

Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />

209


210 Higgs et al.<br />

metabolites, among others. These new biomarker discovery activities are<br />

motivated by the need to improve diagnosis, guide-targeted therapies, and<br />

monitor therapeutic efficacy and toxicity throughout a treatment regimen.<br />

Biomarkers of drug efficacy or toxicity have the potential to shorten the drug<br />

development timeline as they may provide early indications of a drug’s activity.<br />

This potential for increased drug development productivity from high-quality<br />

biomarkers has fueled increased attention from pharmaceutical, biotechnology,<br />

and regulatory agencies alike (1,2). Within the field of protein biomarkers,<br />

mass spectrometry is playing a central role in the discovery of biomarkers from<br />

various biological sample matrices. Quantification of small organic molecules<br />

using extracted ion chromatograms (XICs) from liquid chromatography mass<br />

spectrometry (LC-MS) experiments has a long history in analytical chemistry.<br />

Similar techniques using LC-MS experiments with proteolytic protein digests<br />

are now routinely being applied to quantify peptide and protein levels in<br />

biological samples. Early LC-MS peptide quantification methods relied on the<br />

modification of peptides with reagents enriched in stable isotopes to introduce<br />

mass shifts in the peptides from one sample in order to compare relative<br />

peptide levels to another un-labeled sample (3,4). The number of biological<br />

samples required for statistical power in many applications, the restriction that<br />

study samples must be paired or pooled for these label-based methods, and the<br />

increased cost due to specialized reagents have limited their application and<br />

motivated the search for label-free methods of non-targeted protein profiling.<br />

We report here a comprehensive analytical system to collect and automatically<br />

process the data from non-targeted LC-MS/MS analyses of complex<br />

protein mixtures. In contrast to pattern-based (5,6), difference based (7), or<br />

identification-based quantification methods (8,9), the approach presented here<br />

simply integrates the peptide parent ion current in order to obtain a relative<br />

peptide level in each study sample. No labeling or pooling of study samples<br />

is required. The output from this approach is an N × P table in which each<br />

of P peptides has been quantified in each of the N study samples. This table<br />

maximizes the flexibility in downstream statistical data analysis including transformation,<br />

normalization, and an analysis suited to the experimental design.<br />

The described method is based on the collective efforts of the applied biochemistry<br />

and statistics groups within Lilly Research Laboratories (10,11,12). As<br />

a broad-looking, discovery-oriented assay, it is important to note the limitations<br />

imposed by the approach. An assay designed to detect and quantify many<br />

analytes simultaneously compromises on sensitivity, selectivity, dynamic range,<br />

and absolute quantification relative to a targeted assay designed for a particular<br />

analyte. Ion suppression and co-elution of peptides from complex mixtures<br />

have the potential to interfere with the ion current attributed to a peptide, thus<br />

confounding any inference that may be made about the relative quantities of


Label-Free Biomarker Identification 211<br />

the peptide. The limited dynamic range of these uncalibrated assays tends to<br />

underestimate the magnitude of a change in protein levels for peptides that do<br />

not lie near the linear portion of the instrument response curve. Nonetheless,<br />

these non-targeted methods have shown promise in identifying relative changes<br />

in protein levels that can be followed in subsequent studies using more targeted<br />

assays (e.g., multiple reaction monitoring) (13) to verify the findings in a new<br />

sample set.<br />

The described method focuses on biomarker discovery from human plasma<br />

and cerebrospinal fluid (CSF). Biomarker discovery from these fluids has<br />

proven challenging as the highly abundant proteins (e.g., albumin, IgG) are<br />

difficult to completely remove and tend to mask the detection of lower<br />

abundance proteins that may be directly associated with the biology of interest.<br />

However, the analytical and statistical methods described here are directly<br />

applicable to more targeted sample matrices (e.g., tissues) in both clinical and<br />

pre-clinical models that may increase the probability of technical success based<br />

on samples more directly associated with the biology of interest with fewer<br />

abundant, masking proteins to remove. Sample collection and handling procedures<br />

are critical in reducing the overall variability in biomarker discovery<br />

studies. Age, gender, diet, time of day, and medication may affect the plasma<br />

or CSF protein profile and should be considered in study designs. Similarly,<br />

consistent sample handling tailored to proteomics profiling (e.g., preservatives,<br />

rapid sample freezing, controlling for blood contamination in CSF sampling,<br />

number of sample freeze-thaw cycles, etc.) are important considerations to<br />

ensure high-quality starting material. The proteome is arguably the most<br />

modulated class of biomolecules in disease, treatment, and toxicity, resulting in<br />

the promise of proteomics for biomarker discovery. Despite this promise and<br />

rapid advancements in technology, progress has been slow (14,15). However,<br />

with a refined strategy of: (1) applying non-targeted, hypothesis generation<br />

methods like those described here to sample matrices proximal to the biology,<br />

(2) using targeted MS assays to verify early discoveries in new sample sets,<br />

and (3) clinical validation using established diagnostic assay formats (e.g.,<br />

ELISAs), the potential to fulfill the promise is high by strategically applying<br />

the right technology to the appropriate stage of the biomarker discovery life<br />

cycle (16).<br />

2. Materials<br />

2.1. Albumin/IgG Depletion<br />

1. Montage equilibration buffer, wash buffer, and columns are provided with the<br />

Montage Albumin Deplete Kit (Millipore ® ).<br />

2. ProteinG-Sepharose (Amersham Biosciences ® ).


212 Higgs et al.<br />

2.2. Reduction, Alkylation, and Digestion<br />

1. Denaturing solution and internal standard: 8 M urea in 100 mM (NH 4 ) 2 CO 3 buffer<br />

containing chicken lysozyme (Sigma, St Louis, MO; 10.4 μg/mL), pH 11.0.<br />

2. Reduction/alkylation cocktail: 97.5% ACN, 2% iodoethanol, and 0.5%<br />

triethylphosphine (v/v).<br />

3. Trypsin solution: TPCK treated bovine pancreatic trypsin (Worthington,<br />

Lakewood, NJ) is dissolved at 1 mg/mL in H 2 O and stored in single-use aliquots<br />

at –80°C. Working solutions are prepared by diluting to 5 μg/mL in 100 mM<br />

ammonium bicarbonate pH 8.0 prior to use.<br />

2.3. HPLC<br />

1. The C-18 reversed phase column was a Zorbax SB300 1×50mm(Agilent).<br />

2. Solvent A: 0.1% formic acid (Aldrich) in water (Burdick and Jackson HPLC<br />

grade).<br />

3. Solvent B: 50% acetonitrile, 0.1% formic acid (Aldrich) in water (Burdick and<br />

Jackson HPLC grade).<br />

4. Solvent C: 80% acetonitrile, 0.1% formic acid (Aldrich) in water (Burdick and<br />

Jackson HPLC grade).<br />

2.4. Mass Spectrometry<br />

1. LTQ ion trap mass spectrometer (ThermoFinnigan).<br />

3. Methods<br />

3.1. Plasma Sample Preparation<br />

3.1.1. Albumin/IgG Depletion<br />

1. Dilute a 25 μL aliquot of plasma (1.25 mg protein assuming 50 mg/mL total<br />

protein concentration) with Montage equilibration buffer to a volume of 200 μL<br />

(see Note 1).<br />

2. Add 100 μL of a 50% proteinG-Sepharose bead suspension and rock the mixture<br />

for1hatRT.<br />

3. Pellet the G-Sepharose beads at 2000 rpm for 2 min. and transfer 200 μL of the<br />

effluent to a pre-equilibrated Montage column. Pre-equilibration was performed<br />

with 400 μL of equilibration buffer and centrifugation for 2 min at 500×g<br />

(see Note 2).<br />

4. Centrifuge the Montage column at 500×g for 2 min and re-apply the flow-thru to<br />

the column and centrifuge again. Pass two consecutive 200 μL washes of Montage<br />

wash buffer over the column via 500×g centrifugation for 2 min. (final volume<br />

approximately 600 μL).


Label-Free Biomarker Identification 213<br />

3.1.2. Reduction, Alkylation, and Digestion<br />

1. Spike a 120 μL aliquot of the diluted and depleted plasma with 120 μL of the<br />

denaturing and internal standard solution (see Note 3).<br />

2. Add an equal volume (240 μ(L) of reduction/alkylation cocktail (see Note 4).<br />

3. Cap the solutions and incubate for 1hat37°C.<br />

4. Speed vacuum the solutions to dryness (at least 3 h).<br />

5. Re-dissolve the pellet in 600 μL of the working trypsin solution. Digest overnight<br />

at 37°C (17).<br />

3.2. Cerebrospinal Fluid Sample Preparation<br />

3.2.1. Albumin/IgG Depletion<br />

1. Dilute an aliquot of CSF (34 μg protein based on a Bradford total protein assay)<br />

with Montage equilibration buffer to a volume of 200 μL (see Note 5).<br />

2. Add 100 μL of a 50% proteinG-Sepharose bead suspension and rock the mixture<br />

for1hatRT.<br />

3. Pellet the G-Sepharose beads at 2000 rpm for 2 min and transfer 200 μL of<br />

the effluent to a pre-equilibrated Montage column. Pre-equilibration is performed<br />

with 400 μL of equilibration buffer and centrifugation for 2 min at 500×g (see<br />

Note 2).<br />

4. Centrifuge the Montage column at 500×g for 2 min and re-apply the flow-thru to<br />

the column and centrifuge again. Pass two consecutive 200 μL washes of Montage<br />

wash buffer over the column via 500×g centrifugation for 2 min (final volume<br />

approximately 600 μL).<br />

3.2.2. Reduction, Alkylation, and Digestion<br />

1. Speed vacuum the CSF samples to approximately 30–50 μL and mix with 40 μL<br />

of the denaturing and internal standard solution (see Note 3).<br />

2. Add 100 μL of reduction/alkylation cocktail (see Note 4).<br />

3. Cap the solutions and incubate for 1hat37°C.<br />

4. Speed vacuum the solutions to dryness (at least 3 h).<br />

5. Re-dissolve the pellet in 600 μL of the working trypsin solution. Digest overnight<br />

at 37°C (17).<br />

3.3. HPLC Conditions<br />

1. A Surveyor autosampler and MS HPLC pump (ThermoFinnigan) are used for<br />

separation. 100 μL tryptic digests (4.2 μg plasma non-depleted equivalent protein<br />

or 14 μg CSF non-depleted equivalent protein) onto the reversed phase column<br />

at a flow rate of 50 μL/min (see Note 6). The gradient conditions are: 10–95% B<br />

(90–5% A) over 120 min, followed by a 0.1 min ramp to 100% C, followed by<br />

5 min at 100% C, followed by a 0.1 min ramp to 10% B (90% A), and hold for


214 Higgs et al.<br />

17 min at 10% B (90% A). The effluent is diverted to waste for the first 2 min<br />

to keep the mass spectrometer source clean.<br />

2. Between each sample in the set, an injection of water is made and a shortened<br />

(60 min) gradient, identical to the above, is performed to reduce carryover.<br />

3.4. Mass Spectrometer Conditions<br />

1. The total column effluent (50 μL/min) is connected to the electrospray interface<br />

of the ion trap mass spectrometer.<br />

2. The source is operated in positive ion mode with a 4.8 kV electrospray potential,<br />

a sheath gas flow of 20 arbitrary units, and a capillary temperature of 225°C. The<br />

source lenses should be set by maximizing the ion current for the 2+ charge state<br />

of angiotensin.<br />

3. Data are collected in the triple play mode with the following parameters: centroid<br />

parent scan set to one microscan and 50 ms maximum injection time, profile<br />

zoom scan set to three microscans and 500 ms maximum injection time, and a<br />

centroid MS/MS scan set to two microscans and 2000 ms maximum injection<br />

time (see Note 7).<br />

4. Dynamic exclusion settings are set to a repeat count of one, exclusion list duration<br />

of 2 min, and rejection widths of –0.75 m/z and +2.0 m/z.<br />

5. Collisional activation is carried out with relative collision energy of 35% and an<br />

exclusion width of 3 m/z.<br />

6. Study samples should be injected in a random order to reduce any effects of<br />

carryover or confounding with a non-random injection order (see Note 8).<br />

7. All water blank samples should be analyzed by the mass spectrometer in the same<br />

manner as study samples in order to monitor carryover (see Note 9).<br />

3.5. Zoom Scan Data Processing<br />

The data collected from a zoom scan triple-play experiment are used to<br />

estimate the quality of the subsequent MS/MS spectrum, the charge state of<br />

the peptide, and the monoisotopic and average mass of the peptide. The quality<br />

estimate is used to eliminate those scan events that are triggered by noise<br />

or small molecules from further downstream processing. Peptide mass and<br />

charge state estimates are used in subsequent steps for peptide identification.<br />

Eliminating low-quality scan events and more accurately estimating the charge<br />

state and mass of peptides ultimately reduces the number of false positives that<br />

must be dealt with at the peptide identification stage of the process.<br />

1. Assume the charge state of the detected peptide is 1 + .<br />

2. Given the m/z of the scan event and the assumed charge state, estimate the<br />

theoretical isotope distribution intensities for a peptide of the hypothesized mass<br />

using the relationships given in Fig. 1 (see Note 10). Begin by determining the<br />

relative intensity of the 12 C peak (I 0 ) using the relationship in Fig. 1A and the<br />

MW for the assumed charge state. Next, estimate the relative peak intensity of


Label-Free Biomarker Identification 215<br />

the 13 C peak (I 1 ) by multiplying the estimate of I 0 by the I 1 /I 0 ratio from Fig. 1B<br />

using the MW for the assumed charge state. Isotope intensities I 2 and I 3 are<br />

derived in a similar manner using the ratios from Fig. 1C–D at the MW for the<br />

assumed charge state.<br />

3. Convolve the estimated theoretical isotope stick spectrum with a Gaussian peak<br />

shape that has a peak width similar to that produced in a typical zoom scan<br />

spectrum (18). Linearly scale the result of this convolution such that the maximum<br />

value is one.<br />

(A)<br />

(B)<br />

l 0 / max(l 0 ,l 1 ,l 2 ,l 3 )<br />

l 2 / l 0<br />

1.0 2.0<br />

0.0<br />

500 2500<br />

Mono MVV<br />

(C)<br />

500 2500<br />

Mono MVV<br />

l 3 / l 0<br />

0.5 0.8<br />

l 1 / l 0<br />

0.5 1.5<br />

0.0 1.0<br />

500 2500<br />

Mono MVV<br />

(D)<br />

500 2500<br />

Mono MVV<br />

Fig. 1. Empirically derived relationships (from 15,493 example peptides) between<br />

isotope peak intensities used to estimate the theoretical isotope pattern for a peptide<br />

(A) I 0 /max(I 0 ,I 1 ,I 2 ,I 3 ), non-linear least squares fit:<br />

{ }<br />

1 if MW< 1800<br />

I 0 /maxI 0 I 1 I 2 I 3 =<br />

e −000132+MW <br />

−18000865 if MW ≥ 1800<br />

(B) I 1 /I 0 , linear least squares fit:<br />

I 1 /I 0 =−000498 + 0000560MW ,<br />

(C) I 2 /I 0 , linear least squares fit:<br />

I 2 /I 0 =−0367 + 0000516MW + 159×10 −7 MW − 152734 2 , and<br />

(D) I 3 /I 0 , nonlinear least squares fit: I 3 /I 0 = 00000605e 000251MW −270×10−7 MW 2 .<br />

Reprinted with permission from (10).


216 Higgs et al.<br />

4. Convolve the result from step 3 above with the measured zoom scan to obtain the<br />

matched filter output between the expected zoom scan spectrum from the assumed<br />

charge state and the measured zoom scan spectrum. Record the maximum value<br />

of the output of this convolution along with the x-axis (m/z) value where the<br />

maximum occurred.<br />

5. Repeat steps 2–4 above for an assumed charge state of 2 + ,3 + , and 4 + . The<br />

detected peptide charge state and mass are estimated from the best match between<br />

the observed zoom scan spectrum and the theoretically derived spectrum for<br />

the possible charge states of 1 + ,2 + ,3 + , and 4 + . The cross-correlation between<br />

the best matching theoretical isotope pattern at the m/z shift value associated<br />

with the convolution maximum and the measured zoom scan is used as an<br />

intensity-independent matching score between the measured and the best matching<br />

theoretical spectrum. Triple play events with a cross-correlation score greater<br />

than 0.6 are retained for identification. Triple plays below this threshold represent<br />

scans that are not peptides, a mixture of several peptides in the ion trap, or<br />

very low signal-to-noise measurements. These lower quality scan events are not<br />

retained for any further processing.<br />

3.6. MS/MS Spectral Filtering<br />

In order to reduce the effect of MS/MS noise peaks on the identification of<br />

peptides, a dynamic MS/MS noise level is estimated for each spectrum. This<br />

noise level estimate is then subtracted from all MS/MS peak intensities with<br />

any resulting differences less than zero set to zero. The spectral noise level is<br />

estimated based on the observation that ideal MS/MS spectra of peptides have<br />

relatively few peaks (e.g., y-ions, b-ions, adducts, etc.) in a theoretical or high<br />

signal-to-noise ratio spectrum, while noisy MS/MS spectra typically have a<br />

high density of peaks within a local m/z neighborhood (interpreted as chemical<br />

noise). Therefore, the filtering approach uses a percentile of the peak intensities<br />

within a local m/z neighborhood as the noise estimate, where the percentile<br />

used is based on the density of peaks in the neighborhood – a higher peak<br />

density results in a higher percentile to estimate the local noise level, a lower<br />

peak density results in a lower percentile to estimate the local noise level.<br />

1. Bin the MS/MS spectrum into a vector of equally spaced m/z values (bin width<br />

of 0.1 m/z).<br />

2. At 200 equally spaced m/z value design points between the maximum and<br />

minimum observed m/z values observed in the MS/MS spectrum, estimate the<br />

local peak density by counting the number of non-zero intensities in a ±20 m/z<br />

window around each of the 200 design points. Define the local peak density at<br />

these 200 design points as the number of non-zero peaks counted divided by 40<br />

(peaks per m/z).<br />

3. Transform the local peak density values to a filtering percentile value using the<br />

relationship shown in Fig. 2.


Label-Free Biomarker Identification 217<br />

Fig. 2. Filtering percentile as a function of local MS/MS peak density. Peak density<br />

is defined as the number of MS/MS peaks in a 40 m/z window divided by 40.<br />

{ }<br />

0 if PeakDensity ≤ 01<br />

Filtering Percentile =<br />

if PeakDensity > 01 <br />

Reprinted with permission from (10).<br />

075<br />

1+e 015−PeakDensity<br />

005<br />

4. Obtain an initial noise level estimate by the percentile of MS/MS peak intensities<br />

at each of the 200 design points, where the percentile used at each point is derived<br />

from step 3 above (see Note 11).<br />

5. Smooth the initial noise estimates with a Gaussian kernel smooth (150 m/z<br />

bandwidth) and interpolate between the 200 design points to obtain the final<br />

MS/MS noise estimate at each measured m/z value. Subtract this estimate from<br />

the measured MS/MS peak intensities and set any negative values to zero. An<br />

example of a high and low signal-to-noise MS/MS spectrum and the resulting<br />

estimated noise levels is shown in Fig. 3.<br />

3.7. Peptide Identification<br />

A detailed description of peptide identification is beyond the scope of this<br />

chapter, but some general discussion is warranted given the importance of the<br />

subject and its linkage to quantification with the proposed method. The primary<br />

problem with peptide identification is controlling for false-positive identifications<br />

while maintaining a reasonable sensitivity to detect correct identifications.<br />

Our approach utilizes the outputs of two search engines, Sequest (19) and<br />

X! Tandem (20), along with other descriptive features of identification (e.g.,<br />

charge state, peptide length, etc.) as inputs to a classifier that has been trained


218 Higgs et al.<br />

(A)<br />

50,000<br />

Intensity Intensity<br />

150,000 350,000<br />

0 20,000<br />

200 600 1000 1400<br />

m/z<br />

(B)<br />

0<br />

500 1000 1500<br />

Fig. 3. Example MS/MS spectra and their estimated noise levels. 443 original peaks<br />

reduced to 118 peaks above estimated noise level in high-noise spectrum (A). 589<br />

original peaks reduced to 173 peaks above estimated noise level in lower noise spectrum<br />

(B). Reprinted with permission from (10).<br />

m/z<br />

to identify correct identifications (21). The output of the classifier provides a<br />

unit-less score indicative of the likelihood of a correct identification. Falsepositive<br />

identifications are controlled by running the searches against reversed<br />

versions of the protein databases and estimating the p-values: the probability<br />

of observing a model score from the reversed database search that exceeded<br />

the observed score from the correct database. P-values alone are insufficient<br />

due to the large number of tests (identifications) being done (i.e., with a 0.05<br />

p-value cutoff, 5% of identifications declared correct would in fact be incorrect<br />

in the null condition where there are truly no matches to any MS/MS spectra).<br />

To account for multiple testing, false discovery rates (FDRs) (q-values) for


Label-Free Biomarker Identification 219<br />

peptide identifications are estimated from p-values using the method described<br />

by Benjamini and Hochberg (22). Peptides with identification q-values less than<br />

a threshold, say 0.10, are retained for quantification. Proteins identified by only<br />

one peptide are visually examined to eliminate obvious incorrect identifications<br />

(e.g., less than four consecutive y- or b-ions). We estimate that the proportion of<br />

false identifications using such a procedure is less than or equal to 2%. Overall,<br />

the method is similar in strategy to PeptideProphet (23) with the following<br />

extensions: multiple search engines are employed, a more flexible classifier<br />

(e.g., Random Forests) is used, and statistical significance is estimated from a<br />

null distribution of classifier scores derived from reversed database searching<br />

instead of fitting a mixture model to the distribution of classifier output scores.<br />

The method is described in detail in Higgs et al. (11).<br />

In general, we typically restrict biomarker hypothesis generation to identified<br />

peptides. The same relative quantification method can be used with unidentified<br />

peptides (MS features), although in practice these features need to be identified<br />

to be of practical use to clinicians and biologists. To maximize the coverage<br />

of proteins identified in a study, identifications from all samples in the study<br />

are pooled and used to create a list of peptides to quantify in each sample.<br />

Thus, a confident identification needs to be made once out of a sample in order<br />

for the associated peptide ion current to be quantified in all study samples.<br />

Pooling the identifications across all samples in a study significantly increases<br />

the number of identifications relative to the number of identifications from any<br />

single sample.<br />

3.8. Chromatographic Alignment<br />

Variability in the abundance of individual peptides between different samples<br />

may result in that peptide triggering an MS/MS scan in one sample and not in<br />

another. The area of this peptide may still be extracted from the primary mass<br />

spectrum in each sample. However, doing so requires high-quality chromatographic<br />

alignment between the samples so that a consistent region in the<br />

extracted ion chromatogram (XIC) is used for integration across all samples in<br />

a study. Large biomarker studies can produce chromatographic retention time<br />

shifts greater than 1 min between pairs of samples run several days and many<br />

samples apart. Simply expanding the integration window by 1 or 2 min to<br />

account for chromatographic variability is not an option in our experience as<br />

we are analyzing complex samples with multiple co-eluting peaks at most XIC<br />

masses. An expanded integration window that includes multiple peaks masks<br />

the quantification of individual peptides, produces results that are confounded<br />

with multiple peptides contributing to a value, and increases variability. Peak<br />

picking is another option, but was not applied here due to the computational


220 Higgs et al.<br />

cost as well as the inherent heuristic nature of peak picking algorithms with<br />

an associated variability in what is being integrated. We have found a simple<br />

pair-wise alignment between all samples and a select reference sample in the<br />

study to work well for numerous biomarker discovery projects. This approach<br />

to alignment is founded on the following assumptions: (a) the samples included<br />

in the study are generally quite similar to each other with respect to their peptide<br />

content (i.e., there are many peptides or landmarks in common between the<br />

samples), (b) the same chromatographic conditions are used for each sample in<br />

the study, and (c) in a local region of retention time, the retention time offset<br />

between any two samples is approximately constant (see Note 12).<br />

1. Identify the landmarks in the reference sample by taking all triple-play scan events<br />

with a zoom scan cross-correlation score of 0.65 or greater. This set of reference<br />

sample landmarks will be matched against other samples in the study.<br />

2. Identify the matching landmarks in a study sample by declaring a landmark match<br />

if the sample and reference triple-play events have: (a) the retention time of the<br />

triple play event between the samples is within a user-specified amount (5 min),<br />

(b) the charge state of the peptide matches, (c) the m/z value of the monoisotopic<br />

peak from the zoom scans is within a user-specified amount (0.7 Da) between the<br />

two samples, (d) the zoom scan cross-correlation coefficient of both peptides to<br />

their respective theoretical isotope patterns exceeds a threshold (0.65), and (e) the<br />

similarity between the corresponding MS/MS spectra exceeds a threshold (e.g.,<br />

0.75). The MS/MS similarity metric has been implemented as a cross-correlation<br />

coefficient between two MS/MS spectra following a convolution of each MS/MS<br />

stick-spectrum with a Gaussian peak shape.<br />

3. For each matching pair of landmarks identified in step 2 above, generate the<br />

XIC for the feature in a local retention time window (e.g., ±5 min of scan event<br />

time in each sample). Convolve the two XICs to identify the time shift value that<br />

maximizes the convolution result between the landmark XICs in both samples.<br />

Record the time shift and cross-correlation at the optimal shift value for each<br />

landmark. The cross-correlation value will be used as a weighting factor in the<br />

subsequent smoothing step below.<br />

4. The optimal time shift values for each pair of landmarks between a sample and the<br />

reference defines a warping function that can be used to transform the retention<br />

time values of a sample to the reference. Estimate a smooth warping function<br />

by fitting a weighted loess (24) to the time shift versus retention time values<br />

for each sample. The loess should be done in a weighted manner using the XIC<br />

cross-correlation values from step 3 above as weights. The result is a smooth<br />

function that can be used to transform a sample’s retention time to a common<br />

time defined by the reference sample Fig. 4.<br />

5. The loess warping function for a sample is then applied to all the retention times<br />

in the chromatogram (landmark or not). Thus, all samples in a study are projected<br />

onto the same retention time scale. The warping function between two samples is<br />

generally not monotonic over the entire retention time range, and no restriction


Label-Free Biomarker Identification 221<br />

Shift (min) n = 462<br />

–0.5<br />

0.0<br />

0.5<br />

0 20 40 60 80 100 120<br />

Ret. Time (min)<br />

Fig. 4. Example chromatographic alignment (“warping”) function between two rat<br />

serum samples. Retention time shift (min) vs. retention time (min) for 462 landmark<br />

peptides are plotted with the resulting loess fit. Reprinted with permission from (10).<br />

on overall monotonicity is used in our estimate of the warping function. We<br />

do, however, preserve the overall rank order of the retention times following<br />

alignment by constraining the bandwidth (span = 0.5) used in the loess fitting<br />

(24) (see Note 13).<br />

3.9. Peptide Quantification<br />

Relative quantification of peptides is carried out by integration of the XIC<br />

peak (using normalized retention times from the chromatographic alignment)<br />

from the primary mass spectrum within each sample. A list of peptides to<br />

integrate within each sample is constructed by pooling together all triple-play<br />

events across all the samples. This pooling can be done with or without the use<br />

of peptide identification. As previously noted, we typically restrict the analyses<br />

to identified peptides. For each identified peptide, perform the following<br />

steps:<br />

1. For each sample in which the peptide was identified, extract the XIC for the<br />

peptide and compute the centroid (weighted average of retention time values<br />

where weighting factor is the XIC ion current) of the XIC in a small retention<br />

time neighborhood (–0.5 min to +1.0 min from triple-play trigger time) using the<br />

aligned time values in the XIC. Compute the mean centroid time for the peptide<br />

over all samples in which the peptide was identified. Also compute the mean<br />

average m/z value estimated from the zoom scan spectrum for each sample in<br />

which the peptide was identified.


222 Higgs et al.<br />

2. For each sample in the study, create an XIC for the peptide using the mean zoom<br />

scan average m/z value determined in step 1.<br />

3. Estimate a local XIC baseline level and subtract the baseline from the XIC<br />

intensity values from each sample. A local linear baseline can be estimated by<br />

fitting a line between the lowest intensity XIC point before the peak and the lowest<br />

intensity XIC point following the peak in a local neighborhood (e.g., 5 min).<br />

This simple local linear baseline estimate always results in a baseline estimate<br />

below the signal intensity in the local neighborhood, leading to a low bias in the<br />

estimated baseline. For large peaks, this bias is negligible but for small peaks<br />

the bias may have a more pronounced effect on quantification. Alternatively,<br />

an asymmetric least squares smoothing approach may be used to estimate the<br />

baseline XIC values in order to reduce the potential bias with the simple local<br />

linear approach (25).<br />

4. A fixed retention time window (±0.5 min for the chromatography described)<br />

around the mean centroid time value described in step 1 is used for integration.<br />

The width of this window is dependent on the chromatography method used.<br />

For the chromatography method reported here, the peak width remains relatively<br />

constant across the HPLC gradient (i.e., no band-broadening is observed).<br />

If band-broadening is observed, then the integration window width should<br />

be modeled as a function of the retention time (e.g., integration window<br />

width = intercept + slope × retention time).<br />

5. Integrate the baseline corrected XIC values within the fixed retention time window<br />

for each sample in the study using a numerical integration algorithm such as the<br />

trapezoid rule. Record the XIC area values for each peptide in each sample. An<br />

example of XIC integration for a small study is shown in Fig. 5.<br />

3.10. Data Transformation and Normalization<br />

Following the integration of peptide-specific XIC peaks in all study samples,<br />

we have a rectangular data table with N rows corresponding to N samples in<br />

the study, and P columns corresponding to peptides detected in the study. The<br />

cell values in this data table are the peptide peak areas. With this table in hand,<br />

the usual operations of transformation and normalization may be applied prior<br />

to any statistical analysis.<br />

1. Peptide peak areas are approximately log-normal distributed. Apply a log 2 transformation<br />

to all peak area values (see Note 14).<br />

2. Normalize the log 2 transformed peak areas using a quantile normalization<br />

procedure (26) (see Note 15).<br />

3. Normalized log 2 peptide areas may be used directly as input to the statistical<br />

analysis for the study (peptide level analysis). Additionally, the average<br />

of normalized log 2 peptide areas for all the peptides identified from a protein<br />

can be used as an overall estimate of the protein level (protein level analysis,<br />

see Note 16).


Label-Free Biomarker Identification 223<br />

Fig. 5. XICs from the 2 + –1 macroglobulin peptide ATPLSLCALTAVDQSVLL-<br />

LKPEAK for eight rat serum samples following chromatographic alignment. Note that<br />

the peak from all samples fits within the highlighted [83.2, 84.2] integration region.<br />

Reprinted with permission from (10).


224 Higgs et al.<br />

3.11. Study Design, Power, Sample Size, and Analysis<br />

Our strategy of producing an N × P table of relative peptide levels allows<br />

the flexibility for the analysis to be done in a manner consistent with the<br />

study design. Note that no part of the described method imposes any limitation<br />

on the final study statistical analysis (e.g., pooling of samples, subtractiveor<br />

difference-based methods, etc.). In general, the statistical analysis used for<br />

identifying potential protein biomarkers in a study should follow the same<br />

approach as a primary clinical endpoint analysis would take (i.e., a simple<br />

paired design should be analyzed with a paired t-test, a crossover design with<br />

repeated measures within period should be analyzed as a crossover study with<br />

repeated measures within period, etc.).<br />

An analysis of a single clinical endpoint may use the familiar type I error<br />

threshold of 0.05 as a measure of statistical significance. This approach does not<br />

work well when testing hundreds or thousands of proteins in a study because, by<br />

definition, 5% of all p-values from a null experiment (an experiment in which<br />

there is truly no treatment or group effect) will have a p-value less than 0.05.<br />

The Bonferroni approach to control the family-wise type I error (controlling<br />

for no errors in the set of declared changes) has been commonly employed as<br />

a means to control false-positive findings (27). However, many investigators<br />

doing proteomic hypothesis generation are willing to tolerate some level of falsepositive<br />

findings in a declared set as long as it is relatively low and estimated.<br />

The use of FDR as a means to identify a set of declared findings with a<br />

specified proportion of false-positives has been widely applied in genomics (22)<br />

and is the current recommendation for proteomic hypothesis generating experiments.<br />

There are numerous estimators of FDR (28,29) with the original method<br />

described by Benjamini and Hochberg used in the work presented here (22).<br />

Just as multiple comparisons should be considered in the analysis of study<br />

data, these should also be considered at the design stage of a new study<br />

aimed at generating hypotheses from highly multiplexed measurements like<br />

proteomics. This is a relatively new field of research with several methods<br />

recently reported (30,31,32,33). A simple approach originally suggested by<br />

Benjamini and Hochberg (22), and adapted by Bemis (34), uses traditional<br />

sample-size calculations with the following expression for average type I error<br />

( ave ) over a set of tested hypotheses: ave = f ave q ∗ m 1<br />

where f m 1 +m 0 1−q ∗ ave is the<br />

average power of hypothesis tests conducted in a study, q ∗ is the rate at which<br />

FDR is to be controlled, m 0 is the number of true null hypotheses tested, and m 1<br />

is the number of true alternative hypotheses tested. Sample-size estimates are<br />

made by first estimating ave using the desired values for f ave and q ∗ , assumed<br />

values for m 0 and m 1 , and existing sample size calculators using for a given<br />

study design. An example set of sample-size curves using ave this approach<br />

for the two-sample t-test design is given in Fig. 6.


Label-Free Biomarker Identification 225<br />

Fig. 6. Estimated sample sized required to detect protein changes in a two-sample<br />

t-test design. Number of subjects in each of the two groups is plotted against the<br />

detectable effect size expressed as a fold-change. Four different levels of total variability<br />

are shown (10% CV, 20% CV, 30% CV, and 40% CV). Sample size estimates were<br />

made using 85% power, a 0.10 target FDR for declaring significance, and an estimated<br />

m<br />

proportion of true null hypotheses, 0<br />

, set to 0.98.<br />

m 0 +m 1<br />

4. Notes<br />

1. We find that plasma total protein concentration, as measured by a Bradford<br />

assay, has a total coefficient of variation (CV) of approximately 11% (includes<br />

inter-subject, intra-subject, and assay error) and ranges between approximately<br />

48 and 68 mg/mL (12). Due to the apparent highly regulated plasma total protein<br />

concentration, it is not generally necessary to measure total protein concentration<br />

for each sample in a study in order to load a consistent amount of protein.<br />

2. The depletion material used is based on a dye affinity removal method for<br />

albumin. There are commercially available antibody-based depletion kits that<br />

may improve albumin removal at a reasonable cost. Abundant protein depletion<br />

is an open and active research area at the time of this writing.<br />

3. Chicken lysozyme is added as a spiked internal standard at this stage in order<br />

to qualitatively assess the digestion efficiency as well as to quantitatively assess<br />

the measurement error across the samples in a study. Other internal standard(s)<br />

could also be used.<br />

4. The reduction/alkylation solution should be prepared just before use.<br />

Triethylphosphine is pyrophoric and should be handled in a fume hood in accordance<br />

with the material safety data sheet. The use of volatile reagents for this step


226 Higgs et al.<br />

reduces the variability in the sample prep by minimizing sample handling steps<br />

and removing the majority of reduction and alkylating reagents. The digestion is<br />

performed with trypsin, which is sensitive to the presence of reducing reagents.<br />

5. We find that CSF total protein concentration, as measured by a Bradford assay,<br />

has a total CV of approximately 27% (includes inter-subject, intra-subject, and<br />

assay error with the additional total variability relative to plasma total protein<br />

attributed to a higher CSF inter-subject variance) with a range between approximately<br />

0.12 and 0.41 μg/mL (12). The higher overall variability is attributed<br />

to a significantly higher inter-subject variability relative to plasma total protein<br />

(12). Due to the higher variability with CSF total protein, we use the results of<br />

Bradford total protein assay to process a consistent total CSF protein amount in<br />

the proteomics assay.<br />

6. The HPLC pumps must be capable of producing a smooth gradient at 50 μL/min.<br />

The gradient formation should be verified by using water in A and 1% acetone<br />

in water for B and running the gradient with UV monitoring at 254 nm. New<br />

HPLC columns should be conditioned with at least four runs of digested serum<br />

before use in the method.<br />

7. The mass spectrometer’s source should be carefully cleaned to minimize chemical<br />

noise. Monitor above 300 m/z and try to maximize the injection time as this is<br />

directly proportional to achievable dynamic range in an ion trap mass spectrometer.<br />

The spray conditions should be optimized for a peptide of about ˜1700 Da.<br />

8. Alternatively, a design could be used to balance various study factors (e.g.,<br />

treatment, gender, age, etc.) with injection order. This approach may be<br />

most appropriate for small studies (e.g.,


Label-Free Biomarker Identification 227<br />

the +3 13 C isotopic peak. The 15,493 example peptides were then used to derive<br />

relationships for I 0 /max (I 0 ,I 1 ,I 2 ,I 3 ), I 1 /I 0 , I 2 /I 0 , and I 3 /I 0 as functions of the<br />

peptide monoisotopic molecular weight (Fig. 1).<br />

11. Percentile transformation is done to define the noise level as the X th percentile<br />

of the peak intensities in a local m/z neighborhood where X is dependent on<br />

the peak density in the neighborhood (higher peak density–>higher percentile–<br />

>higher estimated noise level).<br />

12. One potential improvement to this alignment strategy would be to create a<br />

composite list of landmarks across all study samples instead of relying on a single<br />

sample to serve as the retention time reference. This could easily be accomplished<br />

by grouping or clustering landmarks from all samples enforcing a match on m/z,<br />

charge state, retention time, and MS/MS spectral similarity. This has not been<br />

employed yet due to the increased computational cost and the lack of data demonstrating<br />

any significant problems with the single reference sample approach. In<br />

practice, several different samples are evaluated as potential alignment reference<br />

samples, and the best sample based on a qualitative assessment of the alignment<br />

warping functions is chosen.<br />

13. A visual examination of the alignment warping functions for all samples included<br />

in a study is an effective means to detect and diagnose chromatography problems<br />

encountered in the analysis of dozens of study samples. For example, oscillatory<br />

warping functions have been associated with pump mixing problems while large<br />

magnitude mostly linear warping functions have been associated with column<br />

degradation.<br />

14. Log 2 is convenient because a unit change can be interpreted as a twofold change<br />

on the original scale.<br />

15. Normalization can be particularly important for minimizing systematic biases in<br />

ion current introduced by sample collection and handling, sample concentration,<br />

instrument sensitivity drift during the course of data acquisition, etc. The spiked<br />

internal standard, chicken lysozyme can be helpful in diagnosing and monitoring<br />

ion intensities before and after normalization. Quantile normalization assumes<br />

that the overall distribution of log 2 peptide peak areas is unchanged from sample<br />

to sample. This is generally a reasonable assumption, but there are cases where<br />

a treatment effect may modulate the level of most of the proteins detected in<br />

a study, and in such cases quantile normalization should not be used. In these<br />

cases, the spiked internal standard, chicken lysozyme can be used to normalize<br />

any systematic effects of the process on ion current occurring only after the<br />

standard was spiked.<br />

16. In practice, we will analyze a study at both the peptide and protein levels.<br />

Peptide-level analyses are generally specific to the identified peptide and allow<br />

the opportunity to discover biologically related changes in peptide level due<br />

to processing of a specific region of a protein. Protein-level analyses provide<br />

additional statistical power to detect smaller magnitude changes in protein levels<br />

since we are averaging multiple peptide values, all of which have a high positive<br />

covariance.


228 Higgs et al.<br />

Acknowledgments<br />

We thank John Saalwaechter and Andrew Kaczorek and the entire scientific<br />

computing team for their efforts in developing and maintaining a highavailability<br />

grid-computing environment used for this work. We also thank<br />

Jude Onyia and the statistical and mathematical sciences management team for<br />

supporting us in the development of these methods.<br />

References<br />

1. FDA Critical Path Initiative 2006 (http://www.fda.gov/oc/initiatives/criticalpath).<br />

2. NIH Road Map for Medical Research 2006 (http://www.nihroadmap.nih.gov/<br />

index.asp).<br />

3. Gygi, S.P., Rist, B., Gerber, S.A., Turecek, F., Gelb, M.H., and Aebersold, R. 1999.<br />

Quantitative analysis of complex protein mixtures using isotope-coded affinity<br />

tags. Nat. Biotechnol. 17: 994–999.<br />

4. Aggarwal, K., Choe, L.H., and Lee, K.H. 2006. Shotgun proteomics using the<br />

iTRAQ isobaric tags. Brief. Funct. Genomic. Proteomic. 5: 112–120.<br />

5. Petricoin, E.F., Ardekani, A.M., Hitt, B.A., Levine, P.J., Fusaro, V.A.,<br />

Steinberg, S.M., Mills, G.B., Simone, C., Fishman, D.A., Kohn, E.C. et al 2002. Use<br />

of proteomic patterns in serum to identify ovarian cancer. Lancet 359: 572–577.<br />

6. Radulovic, D., Jelveh, S., Ryu, S., Hamilton, T.G., Foss, E., Mao, Y., and Emili, A.<br />

2004. Informatics platform for global proteomic profiling and biomarker discovery<br />

using liquid chromatography-tandem mass spectrometry. Mol Cell Proteomics 3:<br />

984–997.<br />

7. Wiener, M.C., Sachs, J.R., Deyanova, E.G., and Yates, N.A. 2004. Differential<br />

mass spectrometry: a label-free LC-MS method for finding significant differences<br />

in complex peptide and protein mixtures. Anal. Chem. 76: 6085–6096.<br />

8. Gao, J., Opiteck, G.J., Friedrichs, M.S., Dongre, A.R., and Hefta, S.A. 2003.<br />

Changes in the protein expression of yeast as a function of carbon source.<br />

J. Proteome. Res. 2: 643–649.<br />

9. Colinge, J., Chiappe, D., Lagache, S., Moniatte, M., and Bougueleret, L. 2005.<br />

Differential Proteomics via probabilistic peptide identification scores. Anal. Chem.<br />

77: 596–606.<br />

10. Higgs, R.E., Knierman, M.D., Gelfanova, V., Butler, J.P., and Hale, J.E. 2005.<br />

Comprehensive label-free method for the relative quantification of proteins from<br />

biological samples. J. Proteome. Res. 4: 1442–1450.<br />

11. Higgs, R.E., Knierman, M.D., Freeman, A.B., Gelbert, L.M., Patil, S.T., and<br />

Hale, J.E. 2007. Estimating the statistical significance of peptide identifications<br />

from shotgun proteomics experiments. J. Proteome. Res. 6: 1758–1767.<br />

12. Patil, S.T., Higgs, R.E., Brandt, J.E., Knierman, M.D., Gelfanova, V., Butler, J.P.,<br />

Downing, A.M., Dorocke, J., Dean, R.A., Potter, W.Z. et al. 2007. Identifying<br />

pharmacodynamic protein markers of centrally active drugs in humans: a pilot<br />

study in a novel clinical model. J. Proteome. Res. 6: 955–966.


Label-Free Biomarker Identification 229<br />

13. Anderson, L., and Hunter, C.L. 2006. Quantitative mass spectrometric multiple<br />

reaction monitoring assays for major plasma proteins. Mol Cell Proteomics 5:<br />

573–588.<br />

14. Anderson, N.L., and Anderson, N.G. 2002. The human plasma proteome: history,<br />

character, and diagnostic prospects. Mol Cell Proteomics 1: 845–867.<br />

15. Gutman, S., and Kessler, L.G. 2006. The US Food and Drug Administration<br />

perspective on cancer biomarker development. Nat. Rev. Cancer 6: 565–571.<br />

16. Rifai, N., Gillette, M.A., and Carr, S.A. 2006. Protein biomarker discovery and<br />

validation: the long and uncertain path to clinical utility. Nat. Biotechnol. 24:<br />

971–983.<br />

17. Hale, J.E., Butler, J.P., Gelfanova, V., You, J.S., and Knierman, M.D. 2004.<br />

A simplified procedure for the reduction and alkylation of cysteine residues in<br />

proteins prior to proteolytic digestion and mass spectral analysis. Anal. Biochem.<br />

333: 174–181.<br />

18. Proakis, J.G., and Manolakis, D.G. 1992. Digital Signal Processing – Principles,<br />

Algorithms and Applications. Prentice Hall, New York, NY.<br />

19. Eng, J.K., Mccormack, A.L., and Yates, J.R. 1994. An approach to correlate tandem<br />

mass spectral data of peptides with amino acid sequences in a protein database.<br />

Journal of the American Society for Mass Spectrometry 5: 976–989.<br />

20. Craig, R., and Beavis, R.C. 2003. A method for reducing the time required to match<br />

protein sequences with tandem mass spectra. Rapid Commun. Mass Spectrom. 17:<br />

2310–2316.<br />

21. Ulintz, P.J., Zhu, J., Qin, Z.S., and Andrews, P.C. 2006. Improved classification<br />

of mass spectrometry database search results using newer machine learning<br />

approaches. Mol Cell Proteomics 5: 497–509.<br />

22. Benjamini, Y., and Hochberg, Y. 1995. Controlling the false discovery rate - a<br />

practical and powerful approach to multiple testing. Journal of the Royal Statistical<br />

Society Series B-Methodological 57: 289–300.<br />

23. Keller, A., Nesvizhskii, A.I., Kolker, E., and Aebersold, R. 2002. Empirical statistical<br />

model to estimate the accuracy of peptide identifications made by MS/MS<br />

and database search. Anal. Chem. 74: 5383–5392.<br />

24. Cleveland, W.S., Grosse, E., and Shyu, W.M. 1992. Local regression models.<br />

In Statistical Models in S. J.M. Chambers and T.J. Hastie, eds. Wadsworth &<br />

Brooks/Cole, Pacific Grove, CA.<br />

25. Boelens, H.F., Dijkstra, R.J., Eilers, P.H., Fitzpatrick, F., and Westerhuis, J.A. 2004.<br />

New background correction method for liquid chromatography with diode array<br />

detection, infrared spectroscopic detection and Raman spectroscopic detection. J.<br />

Chromatogr. A 1057: 21–30.<br />

26. Bolstad, B.M., Irizarry, R.A., Astrand, M., and Speed, T.P. 2003. A comparison<br />

of normalization methods for high density oligonucleotide array data based on<br />

variance and bias. Bioinformatics 19: 185–193.<br />

27. Miller, R.G., Jr. 1991. Simultaneous Statistical Inference. Springer-Verlag,<br />

New York.


230 Higgs et al.<br />

28. Butler, K.W., Deslauriers, R., Geoffrion, Y., Storey, J.M., Storey, K.B., Smith, I.C.,<br />

and Somorjai, R.L. 1985. 31P nuclear magnetic resonance studies of crayfish<br />

(Orconectes virilis). The use of inversion spin transfer to monitor enzyme kinetics<br />

in vivo. Eur. J. Biochem. 149: 79–83.<br />

29. Efron, B. 2004. Large-scale simultaneous hypothesis testing: the choice of a null<br />

distribution. J. Am. Stat. Soc. 99: 96–104.<br />

30. Pounds, S., and Cheng, C. 2005. Sample size determination for the false discovery<br />

rate. Bioinformatics 21: 4263–4271.<br />

31. Hu, J., Zou, F., and Wright, F.A. 2005. Practical FDR-based sample size calculations<br />

in microarray experiments. Bioinformatics 21: 3264–3272.<br />

32. Jung, S.H. 2005. Sample size for FDR-control in microarray data analysis. Bioinformatics<br />

21: 3097–3104.<br />

33. Li, S.S., Bigler, J., Lampe, J.W., Potter, J.D., and Feng, Z. 2005. FDR-controlling<br />

testing procedures and sample size determination for microarrays. Stat. Med. 24:<br />

2267–2280.<br />

34. Bemis, K.G. 2005. Statistical Issues with Mass Spectrometry Proteomics for<br />

Biomarker Discovery. In International Workshop on Statistical Methodology in<br />

Clinical and Nonclinical R&DDIA conference, Nice, France.


13<br />

Analysis of the Extracellular Matrix and Secreted Vesicle<br />

Proteomes by Mass Spectrometry<br />

Zhen Xiao, Thomas P. Conrads, George R. Beck, Jr.,<br />

and Timothy D. Veenstra<br />

Summary<br />

The extracellular matrix (ECM) and secreted vesicles are unique structures outside of<br />

cells that carry out dynamic biological functions. ECM is created by most cell types and<br />

is responsible for the three-dimensional structure of the tissue or organ in which they<br />

are originated. Many cells also produce or secrete specialized vesicles into the ECM,<br />

which are thought to influence the extracellular environment. ECM is not s a physical<br />

structure to connect cells in a tissue or organ. The proteins in ECM and secreted vesicles<br />

are critical to cell function, differentiation, motility, and cell-to-cell interaction. Although<br />

a number of major structural proteins of ECM and secreted vesicles have long been<br />

known, an appreciation of the role of less-abundant non-collagenous proteins has just<br />

begun to emerge. This chapter outlines a series of methods used to isolate and enrich<br />

ECM constituents and secreted vesicles from bone-forming osteoblast cells, enabling<br />

comprehensive profiles of their proteomes to be obtained by mass spectrometry. These<br />

methods can be easily adapted to study ECM and secreted vesicles in other cell types,<br />

primary cell cultures derived from animal models, or tissue specimens.<br />

Key Words: extracellular matrix; matrix vesicle; osteoblast; proteomics; mass<br />

spectrometry.<br />

1. Introduction<br />

Most cells reside in a matrix environment called the extracellular matrix<br />

(ECM), which offers the structural and nutritional support as well as a protective<br />

barrier required for cells to survive, interact, and differentiate. In addition to<br />

From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />

Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />

231


232 Xiao et al.<br />

the intracellular and tissue-related processes, it is becoming increasingly clear<br />

that alterations in the ECM can affect the pathogenesis of the disease. While<br />

much effort has been devoted to the understanding of intracellular processes,<br />

the characteristics and functions of ECM have not been equally well studied.<br />

The evidence gathered to date has shown that ECM is a complicated organelle<br />

formed of various proteins that play central roles in cell differentiation,<br />

migration, and cell-to-cell communication (1,2,3). The complexity of ECM is<br />

exemplified in the structure of a skeleton. The formation and homeostasis of<br />

bone is an ongoing process throughout life, and involves the recruitment, replication,<br />

and differentiation of osteoblasts and osteoclasts (4). Osteoblasts are<br />

derived from mesenchymal stem cells and have the potential to further develop<br />

into either osteocytes or lining cells. When induced by the appropriate stimuli,<br />

such as ascorbic acid and -glycerophosphate, osteoblasts undergo proliferation<br />

and maturation toward the osteocyte phenotype (Fig. 1) (5). This process is<br />

accompanied by the accumulation of an ECM and ultimately mineralization of<br />

the ECM in the form of hydroxyapatite (6). The deposition of hydroxyapatite<br />

in ECM is initiated by a unique type of vesicles secreted by osteoblasts, called<br />

matrix vesicles (MVs). With diameters ranging from 30–300 nm, these vesicles<br />

reside in the ECM and play a critical role in mineralization (7,8). They serve<br />

as nucleation sites for mineralization and sustain the accumulation of ECM (9).<br />

A number of proteins, such as annexins and phosphatases, have been identified<br />

within MVs. These proteins are responsible for the enrichment of calcium<br />

and phosphate within the vesicles (8,10,11,12,13). Although the presence and<br />

Fig. 1. The three-stage timeline of the osteoblast cell differentiation. The mineral<br />

deposition is visualized by alizarin red staining of the osteoblasts cultured in the<br />

differentiation medium.


Analysis of ECM and Secreted Vesicle Proteomes 233<br />

function of other proteins are largely unknown, changes in ECM and MV<br />

proteins are associated with diseases such as osteoporosis (14), arteriosclerosis<br />

(15,16,17,18), tumor development, and metastasis (19,20,21,22). A comprehensive<br />

profile of the proteins present in these extracellular organelles enables<br />

a greater understanding of pathophysiology underlying these clinical manifestations.<br />

The development of mass spectrometry (MS) technology combined with<br />

appropriate protein enrichment and peptide separation strategies has made this<br />

aim achievable (23,24,25,26).<br />

This chapter describes the extraction of ECM constituents and MVs from<br />

an osteoblast cell line MC3T3-E1 followed by the analysis of their respective<br />

proteomic profiles by liquid chromatography (LC) fractionation combined with<br />

MS analysis (27). The ECM and MVs are isolated and enriched using centrifugation<br />

and enzymatic approaches. The enrichment of MVs is confirmed by the<br />

measurement of elevated alkaline phosphatase (ALP) activity. Following the<br />

creation of a complex mixture of peptides via a tryptic digestion of the extracted<br />

proteins, this mixture is fractionated using strong cation exchange (SCX) LC.<br />

These fractions are analyzed by nanoflow reversed-phase LC-tandem mass<br />

spectrometry (nanoRPLC-MS/MS), and proteins are identified by searching the<br />

data against appropriate proteomic database.<br />

2. Materials<br />

2.1. Cell Culture<br />

1. MC3T3-E1 pre-osteoblast cell line (see Note 1)<br />

2. Cell culture medium MEM (Irvine Scientific, Santa Ana, CA)<br />

3. Fetal bovine serum (Atlanta Biologicals, Atlanta, GA)<br />

4. Penicillin-streptomycin solution (10,000 I.U./ml penicillin, 10,000 μg/ml streptomycin)<br />

(Invitrogen Corp., Carlsbad, CA)<br />

5. 200 mM of l-glutamine (Invitrogen Corp.)<br />

6. Growth medium: MEM supplemented with 10% fetal bovine serum, 50 U/ml<br />

penicillin, 50 μg/ml streptomycin, and 2 mM l-glutamine<br />

7. Differentiation medium: growth medium supplemented with 50 μg/ml ascorbic<br />

acid (Sigma Chemical Co., St. Louis, MO) and 10 mM -glycerophosphate (Sigma<br />

Chemical Co.)<br />

8. Phosphate-buffered saline (PBS)<br />

9. Trypsin/EDTA (0.25% (w/v) trypsin/0.53 mM EDTA solution in Hank’s BSS<br />

without calcium or magnesium) (ATCC, Manassas, VA)<br />

2.2. Extraction of the ECM Constituents<br />

1. Liberase/blendzyme 1 (0.14 Wünsch units/ml) (Roche Applied Science, Indianapolis,<br />

IN)<br />

2. Centrifuge<br />

3. Bicinchoninic acid (BCA) protein assay reagent kit (Pierce, Rockford, IL)


234 Xiao et al.<br />

2.3. Enrichment of MVs from the ECM<br />

1. Liberase/blendzyme 1 (0.14 Wünsch units/ml) (Roche Applied Science, Indianapolis,<br />

IN)<br />

2. Centrifuge<br />

2.4. Isolation of MVs from Medium<br />

1. Ultra-Clear centrifuge tubes: 1 × 3.5 in (38 ml) and 5/8×4in(17ml)(Beckman,<br />

Palo Alto, CA)<br />

2. Optima L-90K preparative ultracentrifuge (Beckman Coulter, Inc., Palo Alto, CA)<br />

2.5. Alkaline Phosphatase Assay<br />

1. Mild lysis buffer: 250 mM NaCl, 50 mM HEPES, pH 7.5, 0.1% NP-40<br />

2. ALP assay kit, including alkaline buffer (1.5 mM 2-amino-2-methyl-1-propanol,<br />

pH 10.3), p-nitrophenyl phosphate (PNPP) (4 mg/ml) and p-nitrophenol (PNP)<br />

standard solution (10 μmol/ml) (Sigma, St. Louis, MO)<br />

3. Flat bottom 96-well plate<br />

4. Lumimark microplate reader (Bio-Rad, Hercules, CA)<br />

2.6. Strong Cation Exchange Liquid Chromatography of Peptides<br />

1. Trypsin Gold, mass spectrometry grade (Promega, Madison, WI)<br />

2. 25% (v/v) acetonitrile containing 0.1% (v/v) formic acid<br />

3. SCX-LC column (1 mm × 150 mm, polysulfoethyl A) (PolyLC, Columbia, MD)<br />

Fig. 2. Transmission electron microscopic image of matrix vesicles in the ultracentrifuge<br />

pellets (A). The high magnification image (B) shows fine-needle deposits and<br />

black dots, likely signs of calcification, both inside and around the vesicles. Also note<br />

the bilayer membrane of the vesicles (arrowhead).


Analysis of ECM and Secreted Vesicle Proteomes 235<br />

4. Mobile phase A: 25% (v/v) acetonitrile<br />

5. Mobile phase B: 25% (v/v) acetonitrile containing 0.5 M ammonium formate, pH 3<br />

6. 0.1% (v/v) formic acid<br />

7. Vacuum centrifuge<br />

8. Laser-induced fluorescence (LIF) detector<br />

2.7. Nanoflow Reversed-phase Liquid Chromatography Tandem Mass<br />

Spectrometry<br />

1. Slurry packer model 1666 (Alltech, Columbia, MD)<br />

2. Ceramic cutter<br />

3. 75 μm i.d. × 360 μm o.d. × 12 cm long fused silica capillary column (Polymicro<br />

Technologies, Phoenix, AZ)<br />

4. 5 μm, 300 Å pore size C-18 silica-bonded stationary RP particles (Jupiter,<br />

Phenomenex, Torrance, CA)<br />

5. Agilent 1100 nanoLC system (Agilent Technologies, Palo Alto, CA) coupled with<br />

a linear ion-trap (LIT) mass spectrometer (LTQ, ThermoElectron, San Jose, CA)<br />

6. Glass sample injection vials 12 × 32 mm (Wheaton, Millville, NJ)<br />

7. Mobile phase A: 0.1% (v/v) formic acid<br />

8. Mobile phase B: 0.1% formic acid (v/v) in acetonitrile<br />

2.8. Bioinformatic Analysis<br />

1. 20-node Beowulf cluster computer server<br />

2. SEQUEST Cluster version 3.1 SR1 (Thermo Electron Corp., Waltham, MA)<br />

3. Bioworks Browser software 3.2 (Thermo Electron Corp.)<br />

2.9. Validation by Immunofluorescence Staining<br />

1. Primary antibodies: anti-annexin V, anti-emilin-1, anti-IQGAP1 (Santa Cruz<br />

Biotechnology, Inc., Santa Cruz, CA)<br />

2. Secondary antibodies: goat anti-rabbit IgG-FITC, and donkey anti-goat IgG-TR<br />

(Santa Cruz Biotechnology)<br />

3. PBS solution<br />

4. 18 × 18 × 0.15 mm thick glass cover slips<br />

5. Regular microscope glass slides<br />

6. Blocking serum: 10% normal blocking serum in PBS. The blocking serum is<br />

derived from the same species in which the secondary antibody is raised. For<br />

example, if the secondary antibody is raised in goat, use the normal goat serum<br />

diluted to 10% in PBS as the blocking serum.<br />

7. Fixative solution: 3.7% (v/v) formaldehyde in PBS<br />

8. DAPI diluted 1:50,000 in PBS (Invitrogen, Carlsbad, CA)<br />

9. ProLong mounting reagent (Invitrogen)<br />

10. Confocal fluorescence microscope LSM 510 Meta NLO (Carl Zeiss,<br />

Oberkochen, Germany)


236 Xiao et al.<br />

3. Methods<br />

The ECM proteins are extracted from cultured cells by a short exposure<br />

to an ECM-degrading enzyme. To isolate MVs that are either confined to<br />

the ECM or reside in the cell culture medium, two approaches may be used:<br />

(1) For MVs confined to the ECM, an ECM-degrading enzyme is first applied<br />

followed by centrifugation and ultracentrifugation; (2) for MVs in the medium,<br />

centrifugation and ultracentrifugation are applied. The characterization of ECM<br />

and MV proteomes is performed using LC fractionation and MS analysis.<br />

3.1. Cell Culture<br />

1. Grow the murine calvaria-derived osteoblast MC3T3-E1 cells in growth medium.<br />

The medium is changed every two or three days. Passage the cells with<br />

trypsin/EDTA (see Note 1).<br />

2. Once the cell culture reaches ∼50% confluency, replace the growth medium with<br />

10 ml of differentiation medium per plate to induce osteoblast differentiation.<br />

3. Extract the ECM or harvest culture medium on the day indicated in the methods<br />

below.<br />

3.2. Extraction of the ECM Constituents<br />

1. Grow MC3T3-E1 cells in differentiation medium on 10-cm plates. Change the<br />

medium every two or three days (see Note 2).<br />

2. On day 21, aspirate the medium from the plates. Wash the cells with 10 ml of<br />

PBS solution three times.<br />

3. Add 3 ml of liberase/blendzyme 1 solution to each plate. Incubate at 37°C for<br />

30 min.<br />

4. Carefully collect the digested supernatant from the plates without disturbing the<br />

cells.<br />

5. Centrifuge the supernatant at 2000×g for 5 min to remove any free cells. The<br />

resulting supernatant contains ECM proteins.<br />

6. Quantify the amount of ECM proteins using the BCA assay (see Note 3).<br />

3.3. Enrichment of MVs from the ECM<br />

1. Follow the same procedure described earlier to grow and prepare cells (see<br />

Subheading 3.2, steps 1 and 2, and Note 2).<br />

2. On day 21, aspirate the medium and wash the cells three times with PBS.<br />

3. Add 3 ml of liberase/blendzyme 1 solution to each plate. Incubate at 37°C for<br />

30 min (see Note 4).<br />

4. Collect the supernatant from the plates without disturbing the cells. Centrifuge<br />

the supernatant at 2000×g for 5 min to remove any cells that may have been<br />

detached from the plate. Collect the supernatant.<br />

5. Centrifuge the supernatant at 20,000×g at 4°C for 30 min.


Analysis of ECM and Secreted Vesicle Proteomes 237<br />

6. Transfer the supernatant to the Ultra-Clear centrifuge tubes. Use the centrifuge<br />

tubes that fit the volume of the supernatant. Fill the tubes with PBS up to about<br />

2 –3 mm from the top.<br />

7. Subject the supernatant to ultracentrifugation at 100,000×g at 4°C for 60 min.<br />

Carefully remove the supernatant without disturbing the pellet.<br />

8. The pellets are enriched with MVs designed as collagenase-released MVs<br />

(CRMVs) (see Note 5).<br />

9. Confirm the enrichment of CRMVs by assaying the ALP activity using an<br />

aliquot of the pellet (see Note 6 and Subheading 3.5).<br />

10. Resuspend the rest of the pellet in 25 mM NH 4 HCO 3 , pH 8.4. Quantify the<br />

amount of CRMV proteins in the pellet by BCA assay (see Note 3).<br />

3.4. Isolation of MVs from Medium<br />

1. Grow MC3T3-E1 cells in differentiation medium in four 10-cm plates.<br />

2. On day 15, collect the media from multiple plates (see Note 2).<br />

3. Separate cellular debris from the medium by centrifugation at 20,000×g for 30 min<br />

at 4°C.<br />

4. Transfer the supernatant to Ultra-Clear centrifuge tubes. Use the centrifuge<br />

tubes that fit the volume of the supernatant.<br />

5. Further centrifuge the supernatant by ultracentrifugation at 100,000×g for 60 min.<br />

6. Carefully remove the supernatant. The MVs in the pellet are designated as medium<br />

MVs (MMVs) (see Note 5 and Fig. 1).<br />

7. Resuspend an aliquot of the MMV sample in 25 mM NH 4 HCO 3 , pH 8.4.<br />

Determine the protein concentration in the pellet by BCA assay.<br />

3.5. Alkaline Phosphatase Assay<br />

1. For the standard curve: Dilute PNP standard 1:10 in dH 2 O. Add 0, 2, 4, 6, 8, 10,<br />

20, 30, 40, and 50 μl of the standard (i.e., 0, 2, 2, 4, 6, 8, 10, 20, 30, 40, and<br />

50 nmol, respectively) to the wells of a flat-bottom 96-well microtiter plate. Add<br />

mild lysis buffer to make a total volume of 135 μl.<br />

2. For the CRMV and MMV samples: Resuspend an aliquot of the ultracentrifuged<br />

pellet in mild lysis buffer. Quantify the protein by BCA assay. Based on the BCA<br />

assay results, add 25 μg of protein to the 96-well microtiter plate. Add mild lysis<br />

buffer further to make a total volume of135 μl/well.<br />

3. Add 25 μl of alkaline buffer and 25 μl of p-nitrophenyl phosphate (PNPP) to each<br />

well.<br />

4. Incubate the microtiter plate at 37°C for up to 3 h. Monitor the colorimetric<br />

change every hour by measuring absorbance at 405 nm using the microtiter plate<br />

reader. Stop incubation when the absorbance of the sample reaches the range of<br />

the standards.<br />

5. Determine the ALP activity in MV samples by comparing to the PNP standard<br />

curve. Report the ALP activity as nmol PNP produced per minute per milligram<br />

of protein used (see Note 6).


238 Xiao et al.<br />

3.6. Strong Cation Exchange Liquid Chromatography of Peptides<br />

1. Digest 100 μg of ECM, CRMV, or MMV proteins in 25 mM NH 4 HCO 3 , pH 8.4,<br />

with trypsin using a trypsin-to-protein ratio of 1:40. For 100 μg of protein, add<br />

2.5 μg of trypsin. Incubate the digestion at 37°C overnight (see Note 7).<br />

2. Lyophilize the peptide digests in a vacuum centrifuge.<br />

3. Dissolve peptide digests in 100 μl of 25% (v/v) acetonitrile containing 0.1% (v/v)<br />

formic acid.<br />

4. Inject the peptides onto a SCX-LC column (1 × 150 mm, polysulfoethyl A).<br />

5. Maintain the flow rate of the column at 50 μl/min. Mobile phase A is 25% (v/v)<br />

acetonitrile, and mobile phase B is 25% (v/v) acetonitrile with 0.5 M ammonium<br />

formate (pH 3).<br />

6. Elute the peptides using the following 96-min gradient method: 3% B for 3 min,<br />

followed by a linear increase to 10% B in 43 min, a further increase to 45% B<br />

in 40 min, and then to 100% B in 10 min. Monitor the peptide separation by<br />

fluorescence (266 nm excitation/350 nm emission). Collect fractions every minute<br />

for 96 min (see Note 8).<br />

7. Based on the chromatogram, pool the adjacent fractions into a total of 20 fractions<br />

and lyophilize (see Notes 9 and 10).<br />

8. Resuspend each pooled fraction in 20 μl of 0.1% (v/v) formic acid prior to<br />

nanoRPLC-MS analysis.<br />

3.7. Nanoflow Reversed-Phase Liquid Chromatography Tandem Mass<br />

Spectrometry<br />

1. Cut a 12-cm piece of 75 μm i.d. × 360 μm o.d. fused silica capillary column. Use<br />

a torch to briefly flame the section about 2 cm near one end. Once the flamed<br />

section is soft, pull the column to make a 10-cm long section with a closed tip.<br />

To make a fine and flat opening at the end of the tip, lightly score near the end<br />

of the closed tip using a ceramic cutter, and then break the end away.<br />

2. Connect the column to the slurry packer. Pack the column with 5 μm, 300 Å pore<br />

size C-18 silica-bonded stationary reversed-phase particles.<br />

3. Connect the column to an Agilent 1100 nanoLC system coupled with a LIT mass<br />

spectrometer (LTQ, ThermoElectron, operated with Xcalibur 1.4 SR1 software).<br />

4. Transfer the peptide fractions into glass vials. Inject 6 μl of the solution.<br />

5. Mobile phase A is 0.1% (v/v) formic acid and B is 0.1% (v/v) formic acid in<br />

acetonitrile. Elute the peptides using the following gradient method: 2% B at<br />

500 nl/min in 30 min; a linear increase of 2–42% B at 250 nl/min in 110 min;<br />

42–98% in 30 min including the first 15 min at 250 nl/min and then 15 min at<br />

500 nl/min; 98% at 500 nl/min for 10 min.<br />

6. Set the capillary temperature and electrospray voltage at 160°C and 1.5 kV,<br />

respectively. The LIT-MS is operated in a data-dependent MS/MS mode where<br />

the five most abundant peptide molecular ions in every MS scan are sequentially<br />

selected for collision-induced dissociation (CID) using a normalized collision


Analysis of ECM and Secreted Vesicle Proteomes 239<br />

energy of 35%. Apply dynamic exclusion to minimize repeated selection of<br />

peptides previously selected for CID (see Notes 11 and 12).<br />

3.8. Bioinformatic Analysis<br />

1. Search the tandem mass spectra against the UniProt proteomic database from<br />

the European Bioinformatics Institute (http://www.ebi.ac.uk/) with SEQUEST<br />

operating on a 40-node Beowulf cluster (SEQUEST Cluster version 3.1 SR1,<br />

Bioworks Browser 3.2). Limit the search to peptides generated with fully tryptic<br />

cleavage constraints.<br />

2. Set legitimate peptide identification criteria as follows: charge state and crosscorrelation<br />

(X corr ) scores of 1.9 for [M + H] 1+ , 2.2 for [M + 2H] 2+ , 3.1 for<br />

[M + 3H] 3+ , and a minimum delta correlation (△C n ) of 0.08.<br />

3. Base protein identification exclusively on unique peptide hits, i.e., peptides whose<br />

sequence is unique to a given protein (see Notes 13 and 14).<br />

3.9. Immunofluorescence Staining<br />

1. Plate 50,000 cells on glass cover slips in 6-well plates. Culture in differentiation<br />

medium.<br />

2. On day 15, briefly wash the cells with PBS.<br />

3. Fix the cells in 3.7% (v/v) formaldehyde in PBS for 10 min.<br />

4. Incubate with 10% (v/v) normal blocking serum in PBS.<br />

5. Briefly wash the cells with PBS; incubate with primary antibodies for 1.5 h.<br />

6. Wash the cells three times with PBS for 5 min each, and then incubate with<br />

secondary antibodies conjugated with fluorochrome (FITC or Texas Red) for 1 h.<br />

7. Wash the cells three times with PBS for 5 min each, including once with DAPI<br />

diluted 1:50,000 in PBS to stain nuclei.<br />

8. Mount the cover slips on microscope glass slides with ProLong mounting reagent.<br />

9. Observe the cells using a confocal fluorescence microscope (see Note 14).<br />

4. Notes<br />

1. MC3T3-E1 pre-osteoblast cells are derived from newborn murine calvaria (28).<br />

These cells closely resemble primary cell cultures in their proliferation, differentiation,<br />

and mineralization (29,30,31). The combination of ascorbic acid and<br />

-glycerophosphate stimulates MC3T3-E1 to undergo differentiation, which is<br />

characterized by substantial matrix mineralization (32,33). Therefore, it is a<br />

suitable model for the enrichment of ECM and isolation of MVs.<br />

2. It is necessary to culture multiple 10-cm plates (four or more at approximately<br />

4 × 10 6 cells /plate) in order to obtain sufficient amount of protein from ECM<br />

or MVs.<br />

3. Protein quantitation is a common laboratory procedure. The instructions are<br />

included within the BCA assay kit (Pierce); therefore, the procedure is not<br />

described in this chapter.


240 Xiao et al.<br />

4. The liberase/blendzyme 1 is a mixture of highly purified collagenase and<br />

dispase that offers gentle protease activity as compared to other ECM-degrading<br />

enzymes. Note that four blendzyme mixtures with increasing levels of enzymatic<br />

strength are available from Roche. Blendzyme 1 is the mildest version. The<br />

digestion time varies depending on the cell or tissue type. Alternatively, collagenase/dispase<br />

(1 mg/ml of collagenase/dispase in PBS-containing collagenase,<br />

0.1 U/ml and dispase, 0.8 U/ml) (Sigma Chemical Co., St. Louis, MO) can<br />

be used. Collagenase/dispase enzyme mixture is commonly used to digest the<br />

ECM.<br />

5. Two approaches are designed to isolate MVs either from the ECM or directly<br />

from the cell culture medium. In the first approach, enzymatic digestion and<br />

ultracentrifugation are combined to release MVs embedded in the ECM (designated<br />

as CRMVs). In the second approach, ultracentrifugation is applied to the<br />

medium to isolate MVs, designated as MMVs (34). To confirm the enrichment of<br />

MVs, the ultracentrifugation pellets are fixed and examined using transmission<br />

electron microscopy (Fig. 2).<br />

6. Measurement of the enzymatic activity of ALP is a standard marker for MV<br />

isolation (35,36).<br />

7. Instead of using the buffer provided along with trypsin, it is desirable to<br />

resuspend trypsin in 25 mM NH 4 HCO 3 , pH 8.4. The trypsin-to-protein ratio<br />

should be between 1:40 and 1:50. The digestion mixture is incubated overnight<br />

(approximately 16 h).<br />

8. The LIF detector used in this method can be constructed in-house (37). The<br />

LIF detector is more sensitive than a conventional lamp-based fluorescence<br />

detector. The use of a LIF detector is particularly advantageous when a narrow<br />

bore column (


Analysis of ECM and Secreted Vesicle Proteomes 241<br />

peptide is capable of identifying more proteins than the online procedure. Thus,<br />

the offline separation is described in this chapter.<br />

10. The pooling step is optional. The peptide fractions can be pooled based on the<br />

complexity of the chromatogram. In general, pooling to about 20 fractions is<br />

appropriate. It will save LC-MS running time without compromising the number<br />

of proteins that the approach can identify.<br />

11. In general, the MS data acquisition time is set to 150 min, starting 30 min after<br />

the beginning of the peptide elution gradient and synchronized to end with the<br />

elution gradient.<br />

12. An alternative approach: the resulting ECM, CRMV, or MMV protein samples<br />

can be resolved by SDS-PAGE and the proteins visualized by Coomassie<br />

staining. The protein bands that are of greater intensity than those prepared<br />

from undifferentiated cells can be excised and subjected to in-gel digestion with<br />

trypsin and analyzed using nanoRPLC-MS/MS (27).<br />

13. Proteins that are identified in both CRMV and MMV purifications can be<br />

considered as authentic MV proteins with a higher degree of confidence than<br />

those that were identified in only one of the preparations.<br />

14. Gene ontology (GO) (www.geneontology.org) can be used to annotate the<br />

identified proteins and categorize them according to their cellular location,<br />

molecular function, and cellular processes they are associated with.<br />

15. The validation of known MV proteins is conducted using Western blotting<br />

or immunofluorescence staining. Annexin V, a known constituent of MVs, is<br />

used as a protein landmark to locate vesicles in these experiments (38). The<br />

osteoblast cells can be double- stained with anti-annexin V and an additional<br />

antibody against either the extracellular protein emilin-1 or the ras GTPase,<br />

IQGAP1 (27).<br />

Acknowledgments<br />

This project has been funded in whole or in part with Federal funds from<br />

the National Cancer Institute, National Institutes of Health, under Contract No.<br />

N01-CO-12400. The content of this publication does not necessarily reflect<br />

the views or policies of the Department of Health and Human Services, nor<br />

does the mention of trade names, commercial products, or organization imply<br />

endorsement by the US Government.<br />

References<br />

1. Holmbeck, K. and Szabova, L. (2006) Aspects of extracellular matrix remodeling<br />

in development and disease. Birth Defects Res C Embryo Today 78, 11–23.<br />

2. Brooke, B. S., Karnik, S. K. and Li, D. Y. (2003) Extracellular matrix in vascular<br />

morphogenesis and disease: structure versus signal. Trends Cell Biol 13, 51–56.<br />

3. Tahinci, E. and Lee, E. (2004) The interface between cell and developmental<br />

biology. Curr Opin Genet Dev 14, 361–366.


242 Xiao et al.<br />

4. Harada, S. and Rodan, G. A. (2003) Control of osteoblast function and regulation<br />

of bone mass. Nature 423, 349–355.<br />

5. Beck, G. R., Jr. (2003) Inorganic phosphate as a signaling molecule in osteoblast<br />

differentiation. J Cell Biochem 90, 234–243.<br />

6. Aubin, J. E. (2001) Regulation of osteoblast formation and function. Rev Endocr<br />

Metab Disord 2, 81–94.<br />

7. Anderson, H. C. (1995) Molecular biology of matrix vesicles. Clin Orthop Relat<br />

Res, 266–280.<br />

8. Anderson, H. C. (2003) Matrix vesicles and calcification. Curr Rheumatol Rep 5,<br />

222–226.<br />

9. Anderson, H. C., Garimella, R. and Tague, S. E. (2005) The role of matrix vesicles<br />

in growth plate development and biomineralization. Front Biosci 10, 822–837.<br />

10. Kirsch, T. (2005) Annexins – their role in cartilage mineralization. Front Biosci<br />

10, 576–581.<br />

11. Hessle, L., Johnson, K. A., Anderson, H. C., Narisawa, S., Sali, A., Goding, J. W.,<br />

Terkeltaub, R. and Millan, J. L. (2002) Tissue-nonspecific alkaline phosphatase<br />

and plasma cell membrane glycoprotein-1 are central antagonistic regulators of<br />

bone mineralization. Proc Natl Acad Sci USA 99, 9445–9449.<br />

12. Johnson, K. A., Hessle, L., Vaingankar, S., Wennberg, C., Mauro, S., Narisawa, S.,<br />

Goding, J. W., Sano, K., Millan, J. L. and Terkeltaub, R. (2000) Osteoblast tissuenonspecific<br />

alkaline phosphatase antagonizes and regulates PC-1. Am J Physiol<br />

Regul Integr Comp Physiol 279, R1365–1377.<br />

13. Morris, D. C., Masuhara, K., Takaoka, K., Ono, K. and Anderson, H. C. (1992)<br />

Immunolocalization of alkaline phosphatase in osteoblasts and matrix vesicles of<br />

human fetal bone. Bone Miner 19, 287–298.<br />

14. Baldini, V., Mastropasqua, M., Francucci, C. M. and D’Erasmo, E. (2005) Cardiovascular<br />

disease and osteoporosis. J Endocrinol Invest 28, 69–72.<br />

15. Dao, H. H., Essalihi, R., Bouvet, C. and Moreau, P. (2005) Evolution and<br />

modulation of age-related medial elastocalcinosis: impact on large artery stiffness<br />

and isolated systolic hypertension. Cardiovasc Res 66, 307–317.<br />

16. Reynolds, J. L., Joannides, A. J., Skepper, J. N., McNair, R., Schurgers, L. J.,<br />

Proudfoot, D., Jahnen-Dechent, W., Weissberg, P. L. and Shanahan, C. M. (2004)<br />

Human vascular smooth muscle cells undergo vesicle-mediated calcification in<br />

response to changes in extracellular calcium and phosphate concentrations: a<br />

potential mechanism for accelerated vascular calcification in ESRD. J Am Soc<br />

Nephrol 15, 2857–2867.<br />

17. Abedin, M., Tintut, Y. and Demer, L. L. (2004) Vascular calcification: mechanisms<br />

and clinical ramifications. Arterioscler Thromb Vasc Biol 24, 1161–1170.<br />

18. Tintut, Y. and Demer, L. L. (2001) Recent advances in multifactorial regulation<br />

of vascular calcification. Curr Opin Lipidol 12, 555–560.<br />

19. Stewart, D. A., Cooper, C. R. and Sikes, R. A. (2004) Changes in extracellular<br />

matrix (ECM) and ECM-associated proteins in the metastatic progression of<br />

prostate cancer. Reprod Biol Endocrinol 2, 2.


Analysis of ECM and Secreted Vesicle Proteomes 243<br />

20. Yin, J. J., Pollock, C. B. and Kelly, K. (2005) Mechanisms of cancer metastasis to<br />

the bone. Cell Res 15, 57–62.<br />

21. Mundy, G. R. (2002) Metastasis to bone: causes, consequences and therapeutic<br />

opportunities. Nat Rev Cancer 2, 584–593.<br />

22. Roodman, G. D. (2004) Mechanisms of bone metastasis. N Engl J Med 350,<br />

1655–1664.<br />

23. Yates, J. R., III. (2004) Mass spectral analysis in proteomics. Annu Rev Biophys<br />

Biomol Struct 33, 297–316.<br />

24. Yates, J. R., III, Gilchrist, A., Howell, K. E. and Bergeron, J. J. (2005) Proteomics<br />

of organelles and large cellular structures. Nat Rev Mol Cell Biol 6, 702–714.<br />

25. Domon, B. and Aebersold, R. (2006) Mass spectrometry and protein analysis.<br />

Science 312, 212–217.<br />

26. Aebersold, R. and Mann, M. (2003) Mass spectrometry-based proteomics. Nature<br />

422, 198–207.<br />

27. Xiao, Z., Camalier, C. E., Nagashima, K., Chan, K. C., Lucas, D. A., de la<br />

Cruz, M. J., Gignac, M., Lockett, S., Issaq, H. J., Veenstra, T. D., Conrads, T. P.<br />

and Beck Jr, G. R. (2006) Analysis of the extracellular matrix vesicle proteome in<br />

mineralizing osteoblasts. J Cell Physiol, In press.<br />

28. Sudo, H., Kodama, H. A., Amagai, Y., Yamamoto, S. and Kasai, S. (1983) In vitro<br />

differentiation and calcification in a new clonal osteogenic cell line derived from<br />

newborn mouse calvaria. J Cell Biol 96, 191–198.<br />

29. Choi, J. Y., Lee, B. H., Song, K. B., Park, R. W., Kim, I. S., Sohn, K. Y.,<br />

Jo, J. S. and Ryoo, H. M. (1996) Expression patterns of bone-related proteins during<br />

osteoblastic differentiation in MC3T3-E1 cells. J Cell Biochem 61, 609–618.<br />

30. Quarles, L. D., Yohay, D. A., Lever, L. W., Caton, R. and Wenstrup, R. J.<br />

(1992) Distinct proliferative and differentiated stages of murine MC3T3-E1 cells<br />

in culture: an in vitro model of osteoblast development. J Bone Miner Res 7,<br />

683–692.<br />

31. Franceschi, R. T., Iyer, B. S. and Cui, Y. (1994) Effects of ascorbic acid on collagen<br />

matrix formation and osteoblast differentiation in murine MC3T3-E1 cells. J Bone<br />

Miner Res 9, 843–854.<br />

32. Beck, G. R., Jr, Sullivan, E. C., Moran, E. and Zerler, B. (1998) Relationship<br />

between alkaline phosphatase levels, osteopontin expression, and mineralization in<br />

differentiating MC3T3-E1 osteoblasts. J Cell Biochem 68, 269–280.<br />

33. Beck, G. R., Jr, Zerler, B. and Moran, E. (2001) Gene array analysis of osteoblast<br />

differentiation. Cell Growth Differ 12, 61–83.<br />

34. Johnson, K., Moffa, A., Chen, Y., Pritzker, K., Goding, J. and Terkeltaub, R. (1999)<br />

Matrix vesicle plasma cell membrane glycoprotein-1 regulates mineralization by<br />

murine osteoblastic MC3T3 cells. J Bone Miner Res 14, 883–892.<br />

35. Ali, S. Y., Sajdera, S. W. and Anderson, H. C. (1970) Isolation and characterization<br />

of calcifying matrix vesicles from epiphyseal cartilage. Proc Natl Acad Sci USA<br />

67, 1513–1520.<br />

36. Dean, D. D., Schwartz, Z., Bonewald, L., Muniz, O. E., Morales, S., Gomez, R.,<br />

Brooks, B. P., Qiao, M., Howell, D. S. and Boyan, B. D. (1994) Matrix vesicles


244 Xiao et al.<br />

produced by osteoblast-like cells in culture become significantly enriched in<br />

proteoglycan-degrading metalloproteinases after addition of beta-glycerophosphate<br />

and ascorbic acid. Calcif Tissue Int 54, 399–408.<br />

37. Chan, K. C., Muschik, G. M. and Issaq, H. J. (2000) Solid-state UV laser-induced<br />

fluorescence detection in capillary electrophoresis. Electrophoresis 21, 2062–2066.<br />

38. Wang, W., Xu, J. and Kirsch, T. (2005) Annexin V and terminal differentiation of<br />

growth plate chondrocytes. Exp Cell Res 305, 156–165.


IV<br />

Clinical Proteomics and Antibody Arrays


14<br />

Miniaturized Parallelized Sandwich Immunoassays<br />

Hsin-Yun Hsu, Silke Wittemann, and Thomas O. Joos<br />

Summary<br />

This chapter describes the development and use of bead-based miniaturized multiplexed<br />

sandwich immunoassays for focused protein profiling. Bead-based protein arrays<br />

or suspension microarrays allow simultaneous analysis of a variety of parameters within<br />

a single experiment. In suspension microarrays capture antibodies are coupled onto colorcoded<br />

microspheres.<br />

The applications of suspension microarrays are described, which allow to analyze<br />

proteins present in different types of body fluids, such as serum or plasma, cerebrospinal,<br />

pleural and synovial fluids, as well as cell culture supernatants. The chapter is divided into<br />

the generation of suspension microarrays, sample preparation, processing of suspension<br />

microarrays, validation of analytical performance, and finally pattern generation using<br />

bioinformatics tools.<br />

Key Words: suspension microarray; microspheres; immunoassay; protein profiling;<br />

biological fluids; serum; pleura; cell culture supernatants; cerebrospinal fluid; synovial<br />

fluid.<br />

1. Introduction<br />

Protein microarray technology allows simultaneous determination of a large<br />

variety of analytes from a minute amount of sample within a single experiment.<br />

Assay systems based on this technology are currently applied for identification<br />

and quantitation of proteins. Protein microarray technology is of major interest<br />

for proteomic research in basic and applied biology as well as for diagnostic<br />

applications. Miniaturized and parallelized assay systems have reached adequate<br />

sensitivity, and hence have the potential to replace singleplex analysis systems.<br />

From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />

Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />

247


248 Hsu et al.<br />

Beside the well-known planar microarray-based systems, which are perfectly<br />

suited to screen a large number of target proteins, bead-based systems named<br />

suspension assays are a very interesting alternative, especially when the number<br />

of parameters of interest is comparably low. Suspension assay systems employ<br />

different color-coded or size-coded microspheres as the solid support for capture<br />

molecules. A flow cytometer, which is able to identify each individual type of<br />

bead and quantify the amount of captured targets on each individual bead, is<br />

used as a readout system. In the first step, antigen-specific capture antibodies<br />

are immobilized on the individual bead type. Different bead types are combined<br />

and incubated with the sample of interest. A labeled secondary antibody<br />

detects the captured analytes and is visualized with a fluorescent reporter<br />

system. Sensitivity, reliability, and accuracy are similar to those observed with<br />

standard microtiter ELISA procedures (1). Color-coded microspheres can be<br />

used to perform up to a hundred different assay types simultaneously. The flow<br />

cytometer identifies several thousand microspheres in a second, and simultaneously<br />

quantitates the amount of captured analytes (2,3,4,5,6). Suspension<br />

microarrays are currently advanced within the field of miniaturized multiplexed<br />

ligand binding assays with respect to automation and throughput (7).<br />

Miniaturized parallelized assay systems have to demonstrate appropriate<br />

sensitivity, precision, and reliability before they will be applied for screening<br />

or diagnostic purposes.<br />

This chapter describes the development and use of suspension antibody<br />

microarrays for protein profiling of several human body fluids. The standard<br />

methodology guidance is described to validate immunoassays (10,11,12) and to<br />

determine the sensitivity, precision, and accuracy of the multiplexed analysis.<br />

In the final section, data analysis is described to show how to deal with highdimension<br />

data sets (13,14).<br />

2. Materials<br />

2.1. Equipment<br />

1. Centrifuge: 5415D (Eppendorf)<br />

2. Vortex Mixer (Neolab)<br />

3. Ultrasonic bath<br />

4. Thermomixer (Eppendorf)<br />

5. Luminex100 instrument (Luminex Corp.)<br />

6. Vacuum manifold (Millipore)<br />

7. Filterplates (Millipore 96-well plate, cat. # MAB1250)<br />

8. Microcentrifuge tubes (Starlab 1.5 ml, cat. # I1415-2500)<br />

9. Carboxylated Beads (Qiagen, cat. # 922400 or Luminex Corp.)<br />

10. Deionized water


Miniaturized Parallelized Sandwich Immunoassays 249<br />

2.2. Common Reagents and Materials<br />

1. Bovine serum albumin (BSA, Roth T844.2)<br />

2. PBS (Fischer Scientific, cat. # 9472615)<br />

3. EDC (Pierce)<br />

4. Sulfo-NHS (Pierce)<br />

5. Detection reagent: Streptavidin-phycoerythrin (Streptavidin-PE) stock solution<br />

(1 mg/ml) in 100 mM NaCl, 100 mM sodium phosphate, pH 7.5, containing<br />

2 mM sodium azide (Molecular Probes, cat. #S21388)<br />

2.3. Buffers<br />

1. Activation buffer [100 mM sodium phosphate (Na 2 HPO 4 ), pH 6.2]<br />

2. Coupling buffer (50 mM MES, pH 5.0)<br />

3. Washing buffer [PBS, pH 7.4, and 0.05 % (v/v) Tween-20]<br />

4. Blocking/storage (B/S) buffer: 1% BSA fraction IV (Roth, cat. # T844.2) in 1×<br />

PBS<br />

5. Assay buffer formulation: 1% BSA fraction IV in 1×PBS<br />

3. Methods<br />

3.1. Principle<br />

The principle of suspension antibody microarrays is based on sandwich<br />

immunoassays as represented in Fig. 1. First-capture antibodies are coupled to<br />

carboxylated microspheres. For performing suspension antibody microarrays,<br />

the samples are incubated with coupled microspheres. Bound analytes are<br />

detected with biotinylated antibodies. Phycoerythrin-labeled streptavidin is used<br />

for signal detection. Finally, microspheres are identified by a flow cytometer,<br />

hence allowing the quantitation of the captured analytes.<br />

3.2. Production of Suspension Microarrays—Antibody Coupling to<br />

Carboxylated Microspheres (see Note 1)<br />

Using proven carbodiimide coupling chemistry, the antibodies are covalently<br />

immobilized on carboxylated beads via the amine groups in lysine side chains.<br />

Before coupling, the beads are first activated using EDC/Sulfo-NHS.<br />

Fig. 1. Processing of suspension microarrays. Schematic representation of the steps<br />

required for performing a suspension microarray immunoassay. Figure reproduced from<br />

Proteomics of Human Body Fluids: Principles, Methods and Applications, edited by<br />

Thongboonkerd (2006). (Continued)<br />


250 Hsu et al.


Miniaturized Parallelized Sandwich Immunoassays 251<br />

The antibodies should not contain foreign protein, azide, glycine, Tris, or<br />

any other reagent containing primary amine groups. Otherwise, the antibodies<br />

must be purified by gel-filtration chromatography or dialysis before use.<br />

3.2.1. Bead Activation<br />

1. Sonicate the carboxylated bead stock suspension for 15–20 s to yield a homogeneous<br />

bead suspension. Thoroughly vortex the bead stock suspension for at least<br />

10 s. Take 2.5 × 10 6 beads per coupling reaction.<br />

2. Transfer the bead stock suspension to Starlab microcentrifuge tube.<br />

3. Briefly centrifuge the bead suspension (a quick spin up to 3000×g is sufficient)<br />

and discard the supernatant.<br />

4. Wash the beads with 80 μl activation buffer. Briefly vortex and centrifuge at<br />

10,000×g for 2 min. Discard the supernatant and repeat washing.<br />

5. Resuspend the beads in 80 μl activation buffer. Sonicate for 15–20 s to yield a<br />

homogeneous bead suspension.<br />

6. Freshly prepare EDC solution (50 mg/ml) and Sulfo-NHS solution (50 mg/ml)<br />

(see Notes 2 and 3).<br />

7. Add 10 μl of EDC solution and 10 μl of Sulfo-NHS solution to the bead suspension.<br />

Incubate for 20 min at room temperature (15–25°C) in the dark.<br />

3.2.2. Coupling of Antibodies to Activated<br />

Carboxylated Beads<br />

8. Dilute the protein stock solution with coupling buffer to a concentration of<br />

100 μg/ml in a volume of 500 μl.<br />

9. Centrifuge the beads at 10,000×g for 2 min and discard the supernatant.<br />

10. Wash the beads with 500 μl of coupling buffer. Briefly vortex and centrifuge at<br />

10,000×g for 2 min. Discard the supernatant and repeat washing.<br />

11. Add the diluted antibody solution (500 μl) from step 8.<br />

12. Wrap the tube in aluminum foil to exclude light. Gently agitate the tube with<br />

activated beads and antibody solution on a plate shaker for 2hatroom temperature<br />

(15–25°C).<br />

3.2.3. Washing and Storage of Coupled<br />

Carboxylated Beads<br />

13. Centrifuge the beads at 10,000×g for 2 min and carefully remove and discard<br />

the supernatant.<br />

14. Wash the beads with 500 μl of washing buffer. Briefly vortex and centrifuge at<br />

10,000×g for 2 min. Discard the supernatant and repeat washing.<br />

15. Resuspend the bead pellet in 1 ml B/S buffer including 0.05% (w/v) azide.<br />

16. Determine the bead concentration of the suspension using a cell-counting<br />

chamber.


252 Hsu et al.<br />

3.2.4. Counting Beads Using a Cell-Counting Chamber<br />

1. Add 5 μl of beads to 45 μl of PBS and mix.<br />

2. The hemacytometer is filled with 10 μl of the sample by placing the pipette tip<br />

against the loading “V” of the hemacytometer at a 45° angle. The sample is<br />

slowly released between the slide and the cover slip until the counting chamber<br />

is loaded. It is important to fill both sides of the chamber and wait for 2–3 min<br />

to allow the beads to settle.<br />

3. Count the cells at two opposite corners of the scored chamber and take an average.<br />

Each of the nine squares on the grid has an area of 1 mm 2 , and the coverglass<br />

rests 0.1 mm above the floor of the chamber. Thus, the volume over the central<br />

counting area is 0.1 mm 3 or 0.1 ml. Multiply the average number of beads in<br />

each central counting area by 10,000 to obtain the number of beads per milliliter<br />

of diluted sample. Multiply by the dilution factor of 10 to get beads/ml.<br />

4. Store the beads at 25×, typically 5×10 6 beads/ml.<br />

3.3. Processing of Bead-Based Multiplex Assays<br />

3.3.1. Sample Preparation<br />

Here, the preparation of proteins for use in multiplexed assay from clinical<br />

specimens or cell culture is described. Subheading 3.3.1.1 describes the use<br />

of serum or plasma; Subheading 3.3.1.2 describes the analysis of proteins<br />

present in cell culture supernatants; Subheading 3.3.1.3 describes the sample<br />

preparation of cerebrospinal, synovial, and pleural fluids.<br />

3.3.1.1. Serum or Plasma Samples<br />

Serum and plasma samples should be spun down (8000×g) prior to assay<br />

to remove particulate and lipid layers. This will prevent the blocking of wash<br />

plate as well as sample needle. The samples should be handled as biohazards<br />

since they may carry infectious agents. Freezing-thawing cycles might result in<br />

a measurable breakdown of some proteins (e.g., cytokines), and so the samples<br />

should be aliquoted before any experiment. The storage of aliquoted samples at<br />

–80°C is recommended. When we analyzed eight matched serum and plasma<br />

samples on the Luminex platform, no differences were seen between samples<br />

that underwent a freeze-thaw for levels of TNF, Eotaxin, IL-13, MCP-1, IFN,<br />

IL-12p70, MIP-1, IP-10, or GM-CSF. There was, however, a significant<br />

increase in IL-1 after freeze-thaw, suggesting that this process may liberate<br />

IL-1 from insoluble receptors. IL-1 and MCP-1 levels were significantly<br />

higher in plasma as compared to the matched serum sample. IP-10 was higher in<br />

serum. Figure 2 shows the freeze-thaw experiments to evaluate 10plex soluble<br />

receptor assays. It seemed that signal from some analytes was slightly decreased<br />

after freeze-thaw cycle; however, no statistically significant differences were


Miniaturized Parallelized Sandwich Immunoassays 253<br />

10,000<br />

1000<br />

MFI<br />

100<br />

10<br />

1<br />

thaw<br />

fresh<br />

thaw<br />

fresh<br />

thaw<br />

fresh<br />

thaw<br />

fresh<br />

thaw<br />

fresh<br />

thaw<br />

fresh<br />

thaw<br />

fresh<br />

thaw<br />

fresh<br />

thaw<br />

fresh<br />

thaw<br />

fresh<br />

gp130 ICAM Fas TNFRII VCAM IL-2R E-sel TNFRI RAGE MIF<br />

Fig. 2. Serum samples were drawn from three healthy donors. Each sample was<br />

divided into two parts. One part was measured directly after serum was taken; and the<br />

other part was subjected to a freeze-thaw cycle. Soluble receptors were analyzed using<br />

Luminex technology. There were no significant differences in MFI signals attributed<br />

to the freeze-thaw cycle.<br />

observed. Another important consideration in analyzing serum or plasma<br />

samples is the need for an appropriate buffer (described in Subheading 3.3.2).<br />

3.3.1.2. Cell Culture Samples<br />

Before use, the cell culture supernatants should be centrifuged at 14,000×g<br />

to remove any particulates. The cell culture supernatants can be diluted in their<br />

corresponding cell culture medium. As well as for serum samples, cell culture<br />

supernatants should be aliquoted and frozen at –80°C for any experiment.<br />

3.3.1.3. Cerebrospinal, Synovial, and Pleural Fluids<br />

Precious samples of limited volume such as cerebrospinal fluid (CSF) and<br />

synovial fluid are ideal candidates for multiplex analysis. To the synovial<br />

fluid, animal serum should be added to prevent heterophilic antibodies and<br />

rheumatoid factor (RF) binding, which can cause false positives. For cytokine<br />

assays, the samples may be filtered with a 50-kDa filter to remove the interfering<br />

antibodies. Another recently described method uses protein L to remove RF<br />

from serum(8). CSF samples have been analyzed for 22 cytokines using the<br />

Luminex platform, 11 cytokines were detected (9). The authors performed spike<br />

recovery experiments and describe the recoveries as good.


254 Hsu et al.<br />

3.3.2. Diluent<br />

It is important that the diluents selected for reconstitution and dilution of<br />

the standards reflect the environment of the samples being measured. Diluents<br />

for specific sample types have to be validated prior to use. For analyzing cell<br />

culture samples, the standards and samples are diluted in the respective cell<br />

culture medium. It is important to use the same lot of fetal bovine serum (FBS)<br />

as there may be significant differences between lots, which can interfere with<br />

the assay. Another factor to ensure is the pH of the sample, which will affect<br />

antibody binding. For assaying serum samples, each laboratory should develop<br />

and validate an appropriate diluent. We suggest starting with PBS supplemented<br />

with 10–50% animal serum (e.g., fetal calf serum, horse serum or goat serum,<br />

depleted human serum). The goal is to mimic the serum matrix to ensure similar<br />

binding kinetics in both serum and standard samples. The serum samples may<br />

also require dilution with small amounts of serum to prevent false positives,<br />

as some human antibodies may show reactivity toward the mouse captures.<br />

Generally, 1–2% of each species of antibodies is sufficient. The serum diluent<br />

must not be used to dilute the detection antibody or the streptavidin-PE.<br />

3.3.3. Detection Antibody<br />

The concentration of detection antibody used can be varied to create<br />

an immunoassay with different sensitivity and dynamic range. The authors<br />

typically use detection antibody at a concentration between 0.5 μg/ml and<br />

1.0 μg/ml. Optimization is necessary. The quantitative range of the assay can<br />

be shifted by changing the antibody concentration. The dilution of the detection<br />

antibody shifts the standard curve to the lower concentration range, whereas an<br />

increased concentration shifts the curve to the higher concentration range.<br />

3.3.4. General Protocol for Processing Bead-Based Multiplex Assays for<br />

the Determination of Proteins in Human<br />

1. Centrifuge the sample at 14,000×g to precipitate any particulates before diluting<br />

into appropriate diluent. The dilution factors will vary depending on sample type<br />

and concentration of analyte.<br />

2. Resuspend the standard into appropriate diluent and prepare an eight-point<br />

standard curve using twofold serial dilutions.<br />

3. Wet filter plate with 100 μl assay buffer.<br />

4. Plate fitting: Add 50 μl of the standard or sample to each well.<br />

5. Sonicate the coupled beads for 15–20 s to yield a homogeneous suspension.<br />

Thoroughly vortex the beads for at least 10 s.<br />

6. Dilute the beads to 1500 beads per well, and add 25 μl of diluted bead suspension<br />

to each well.


Miniaturized Parallelized Sandwich Immunoassays 255<br />

7. Incubate for 2hinthedark at room temperature (see Note 4).<br />

8. Washing step: Apply vacuum manifold to the bottom of filter plate to remove<br />

liquid. Wash by adding 100 μl of assay buffer. Repeat washing twice. Resuspend<br />

the beads in 75 μl of assay buffer.<br />

9. Add 25 μl of the detection antibody solution to each well.<br />

10. Incubate for 1.5 h in the dark at room temperature.<br />

11. Washing step: Apply vacuum manifold to the bottom of filter plate to remove<br />

liquid. Wash by adding 100 μl of assay buffer. Repeat washing twice. Resuspend<br />

the beads in 75 μl of assay buffer.<br />

12. Add 25 μl of Streptavidin-Phycoerythrin solution to each well.<br />

13. Incubate for 0.5 h in the dark at room temperature.<br />

14. Washing step: Apply vacuum manifold to the bottom of filter plate to remove<br />

liquid. Wash by adding 100 μl of assay buffer. Repeat washing twice. Resuspend<br />

the beads in 125 μl of assay buffer.<br />

15. Incubate on a plate shaker for 1 min.<br />

16. Read the results on Luminex 100 instrument.<br />

17. Data evaluation: We recommend extrapolating the sample concentrations from<br />

a 4-PL or 5-PL curve.<br />

3.3.5. Screening Protocol: 10plex Soluble Receptor Assay for Serum<br />

Samples<br />

1. Resuspend the standard into appropriate diluent and prepare an eight-point<br />

standard curve using twofold serial dilutions.<br />

2. Block the plate with 100 μl B/S buffer (1% BSA in PBS).<br />

3. Beads: 1500 beads of each colored code.<br />

4. Prepare an eight-point standard row mixture in 10% horse serum in B/S buffer<br />

by 1:2 serial dilutions. The highest concentration (ng/mL) used in the standard<br />

curves is shown in the following table:<br />

Molecule IL-2R E-Selectin Icam Fas gp130 TNFRI TNFRII RAGE VCAM MIF<br />

ng/mL 2 6 5 1 2 0.8 1.5 2 5 4<br />

5. Prepare the samples by 1:10 dilution in B/S buffer.<br />

6. Add 30 μl beads and 30 μl sample (or standard) into the wells.<br />

7. Incubate and shake for 1.5 h at room temperature.<br />

8. Wash 3×, each time with 100 μl PBS.<br />

9. Prepare the detection antibody mixture in B/S buffer as shown below:<br />

Det. Ab -IL-2R -E-Selectin -Icam -Fas -gp130 -TNFRI -TNFRII -RAGE -VCAM -MIF<br />

μg/mL 0.4 1 0.4 0.4 1 1 0.6 0.8 0.8 0.8


256 Hsu et al.<br />

10. Add 30 μl detection antibody mixture to each well, incubate, and shake for 1 h<br />

at room temperature<br />

11. Wash 3× each time with 100 μl PBS.<br />

12. Prepare Streptavidin-PE solution (5 μg/mL) in B/S buffer and pipette 30 μl to<br />

each well.<br />

13. Incubate and shake for 30 min at room temperature.<br />

14. Wash 3×, each time with 100 μl PBS.<br />

15. Resuspend the beads in 100 μl B/S buffer.<br />

16. Read the data in Luminex100.<br />

3.4. Validation of Analytical Performance of Miniaturized<br />

Multiplexed Protein Assays<br />

3.4.1. Accuracy<br />

Accuracy is expressed by the closeness of the measured value to the true<br />

value. It should be assessed using a minimum of five determinations over a<br />

minimum of three concentrations across the expected range of the assay. A<br />

deviation of 15% of the measured value to the true value is acceptable. Several<br />

methods for estimating accuracy are available.<br />

1. by comparing the measured analyte values with those of reference data;<br />

2. by adding known quantities of the analyte into an appropriate test matrix (e.g.,<br />

serum, plasma). Then, the recovery is expressed as the measured analyte concentration<br />

relative to the added analyte concentration. The recovery (%) is calculated<br />

as follows: the background concentration of the matrix plus<br />

Recovery (%) =<br />

Measured analyte concentration<br />

Background analyte concentration in text matrix + added analyte concentration ∗100<br />

3.4.2. Selectivity<br />

Selectivity can be assessed by performing cross-reactivity experiments where<br />

multiplex assay is performed with each of the standards assayed separately.<br />

This will ensure that the capture antibody is selective for its respective analyte<br />

only in the assay.<br />

3.4.3. Specificity<br />

Specificity is defined by the ability of an assay to measure unequivocally the<br />

amount of an analyte in the presence of interfering substances. Non-specificity<br />

might be derived from cross-reactivity of the antibody used in the assay with<br />

other proteins or antibodies present in the sample.


Miniaturized Parallelized Sandwich Immunoassays 257<br />

3.4.4. Precision<br />

Precision is expressed by the closeness of agreement between a series of<br />

repeated measurements. It should be assessed using a minimum of five determinations<br />

over a minimum of three concentrations across the expected range<br />

of the assay. The mean value should be within 15% of the coefficient of<br />

variation (CV).<br />

3.4.4.1. Repeatability<br />

Intra-assay precision, or repeatibility, expresses the precision under constant<br />

conditions. The measurements are performed within 1 day by the same analyst<br />

using identical reagents and the same instruments.<br />

3.4.4.2. Reproducibility<br />

Inter-assay precision, or reproducibility, expresses the precision by changing<br />

the measurement conditions, which may involve different analysts, reagents,<br />

instruments, and laboratories.<br />

3.4.5. Limits of Detection and Quantitation (see Note 5)<br />

3.4.5.1. Detection Limit<br />

The limit of detection (LOD) is the lowest amount of analyte in a sample<br />

that can be detected but not quantitated as an exact value. According to IUPAC<br />

definition (2), the limit of detection is estimated as the mean of the zero<br />

standard signal plus three times the standard deviation (SD) obtained on the<br />

zero standard signal:<br />

LOD = Mean zerostandard + 3 ∗ SD zerostandard<br />

3.4.5.2. Quantitation Limit<br />

The limit of quantitation (LOQ) is the lowest amount of analyte in a sample<br />

that can be quantitated with acceptable statistical significance. According to<br />

IUPAC definition, the limit of quantitation is estimated as the mean of the zero<br />

standard signal plus 10 times the SD obtained on the zero standard signal:<br />

LOQ = Mean zerostandard + 10 ∗ SD zerostandard<br />

3.4.6. Linearity<br />

Linearity is defined as the ability of an analytical procedure to produce<br />

signals that are directly proportional to the analyte concentration of the sample.


258 Hsu et al.<br />

3.4.7. Range<br />

The range of an analytical procedure is defined by the interval between the<br />

upper and lower amounts of analyte within which the analyte can be detected<br />

with a suitable level of accuracy, precision, and linearity.<br />

3.4.8. Robustness<br />

Robustness expresses the extent to which the measured values remain<br />

unaffected by small variations in method parameters like temperature, reagent<br />

concentration, or instrumental parameters. It indicates the reliability of an<br />

analytical procedure during normal usage. Figure 3 indicates the standard<br />

curves of 10plex soluble receptor assay. The data have shown the feasibility<br />

and robustness of the assays.<br />

3.5. Pattern Generation<br />

After optimization of the assays, screening jobs can be performed, and<br />

huge amounts of data will be generated. To deal with high-dimensional<br />

10,000<br />

10plex soluble receptors assay<br />

MFI<br />

1000<br />

100<br />

10<br />

MIF<br />

VCAM<br />

RAGE<br />

TNFRII<br />

TNFRI<br />

gp130<br />

Fas<br />

ICAM<br />

IL-2R<br />

E-sel<br />

1<br />

10 100 1000 10,000 100,000<br />

Concentration (pg/ml)<br />

Fig. 3. The standard curves of 10plex soluble receptors assay were plotted according<br />

to average MFI readings from several individual measurements; standard deviation bars<br />

were included. The data reflected the range of the linearity and also the robustness of<br />

the assays.


Miniaturized Parallelized Sandwich Immunoassays 259<br />

data sets, some bioinformatic tools have been provided. For example,<br />

performing clustering analysis to distinguish different diseases or symptoms<br />

of diseases can lead to useful taxonomies, and correct diagnosis of clusters<br />

of symptoms is also extremely essential for successful therapy in the field of<br />

medicine.<br />

Table 1 summarizes the main features in CIMminer (Clustered Image<br />

Maps) (13) and MeV (MultiExperiment <strong>View</strong>er) (14). These are two platforms;<br />

both can be applied for the purposes mentioned above. Unsupervised hierarchical<br />

clustering analysis can be performed using the online tool CIMminer<br />

developed by the National Cancer Institute. MeV is another more integrated<br />

freeware, which was developed by TIGR (The Institute for Genomic Research).<br />

It has launched 23 modules in the analysis. Its capabilities to generate<br />

common clustering data, such as HCL (Hierarchical clustering) and ST (Support<br />

Trees), and several methods like TTEST (T-tests), SAM (Significance Analysis<br />

of Microarrays), ANOVA (Analysis of Variance), and TFA (Two-factor<br />

ANOVA) could help users discover significant parameters based on statistical<br />

analysis. Further sophisticated techniques can be applied including PCA<br />

(Principal Components Analysis), SOTA (Self Organizing Tree Algorithm),<br />

RN (Relevance Networks), KMC (K-Means/K-Medians Clustering), KMS (K-<br />

Means/K-Medians Support), CAST (Clustering Affinity Search Technique),<br />

QTC (QT CLUST), SOM (Self Organizing Maps), GSH (Gene Shaving),<br />

FOM (Figures of Merit), PTM (Template Matching), SVM (Support Vector<br />

Machines), KNNC (K-Nearest-Neighbor Classification), DAM (Discriminant<br />

Analysis Module), COA (Correspondence Analysis), TRN (Expression Terrain<br />

Maps), and EASE (Expression Analysis Systematic Explorer).<br />

Table 1<br />

Comparison of the Main Features in CIMminer and MeV<br />

CIMminer<br />

MeV<br />

Contributor NCI TIGR<br />

Analysis platform Web-based(http://<br />

discover.nci.nih.gov/<br />

cimminer/)<br />

Off-line / Free software( http://<br />

www.tm4.org/mev.html )<br />

Input file ”.txt”, “.zip” ”.txt”, “.mev”, “.tav”, “.gpr”<br />

Order Algorithm More Less<br />

Statistical analysis No Yes, significant parameters could<br />

be found out<br />

Results Color-coded Image Color-coded Image<br />

Reference Science 1997; 275:343–9 Biotechniques 2003; 34:374–8


260 Hsu et al.<br />

4. Notes<br />

1. This method can also be adapted for coupling reactions of antigens, receptors, or<br />

other proteins.<br />

2. Minimize the exposure of EDC and Sulfo-NHS to air, and close containers tightly.<br />

Use fresh aliquots for each coupling reaction and discard after use.<br />

3. S-NHS solution (50 mg/ml) can be prepared and stored at –20°C.<br />

4. Incubation time can be varied. The authors typically incubate between 30 min and<br />

2 h. The primary incubation of the bead and sample can be performed overnight<br />

at 4°C for greater low-end sensitivity.<br />

5. The detection limit is primarily dependent on the quality of the antibodies<br />

used. Additionally, the detection limit is influenced by detection conditions (e.g.,<br />

antibody concentration, incubation time), complexity of the multiplex assay, and<br />

matrix proteins.<br />

References<br />

1. Morgan, E., Varro, R., Sepulveda, H., Ember, J.A., Apgar, J., Wilson, J., Lowe, L.,<br />

Chen, R., Shivraj, L., Agadir, A., Campos, R., Ernst, D., Gaur, A. (2004)<br />

Cytometric bead array: a multiplexed assay platform with applications in various<br />

areas of biology. Clin Immunol, 110, 252–66<br />

2. Dasso, J., Lee, J., Bach, H., Mage, R.G. (2002) A comparison of ELISA and<br />

flow microsphere-based assays for quantification of immunoglobulins. J Immunol<br />

Methods, 263, 23–33<br />

3. Carson, R.T., Vignali, D.A. (1999) Simultaneous quantitation of 15 cytokines using<br />

a multiplexed flow cytometric assay. J Immunol Methods, 227, 41–52<br />

4. Dunbar, S.A., Vander Zee C.A., Oliver, K.G., Karem, K.L., Jacobson, J.W. (2003).<br />

Quantitative, multiplexed detection of bacterial pathogens: DNA and protein applications<br />

of the Luminex LabMAP system. J Microbiol Methods, 53, 245–52<br />

5. Joos, T.O., Stoll, D., Templin, M.F. (2002) Miniaturised multiplexed immunoassays.<br />

Curr Opin Chem Biol, 6, 76–80<br />

6. Prabhakar, U., Eirikis, E., Davis, H.M. (2002) Simultaneous quantification of<br />

proinflammatory cytokines in human plasma using the LabMAP assay. J Immunol<br />

Methods, 260, 207–18<br />

7. Lu, J., Getz, G., Miska, E.A., Alvarez-Saavedra, E., Lamb, J., Peck, D., Sweet-<br />

Cordero, A., Ebert, B.L., Mak, R.H., Ferrando, A.A., Downing, J.R., Jacks, T.,<br />

Horvitz, H.R., Golub, T.R. (2005) MicroRNA expression profiles classify human<br />

cancers. Nature, 435, 834–8<br />

8. de Jager, W., Prakken, B.J., Bijlsma, J.W., Kuis, W., Rijkers, G.T. (2005) Improved<br />

multiplex immunoassay performance in human plasma and synovial fluid following<br />

removal of interfering heterophilic antibodies. J Immunol Methods, 300, 124–35<br />

9. Natelson, B.H., Weaver, S.A., Tseng, C.L., Ottenweller, J.E. (2005) Spinal fluid<br />

abnormalities in patients with chronic fatigue syndrome. Clin Diagn Lab Immunol,<br />

12, 52–5


Miniaturized Parallelized Sandwich Immunoassays 261<br />

10. Findlay, J.W., Smith, W.C., Lee, J.W., Nordblom, G.D., Das, I., DeSilva, B.S.,<br />

Khan, M.N., Bowsher, R.R. (2000) Validation of immunoassays for bioanalysis: a<br />

pharmaceutical industry perspective. J Pharmaceutical Biomed Anal, 21, 1249–73<br />

11. Sanchez-Carbayo, M. (2006) Antibody arrays: technical considerations and clinical<br />

applications in cancer. Clin Chem, 52, 1651–9<br />

12. Kingsmore, S.F. (2006) Multiplexed protein measurement: technologies and applications<br />

of protein and antibody arrays. Nat Rev Drug Discov, 5, 310–20<br />

13. Weinstein, J.N., Myers, T.G., O’Connor, P.M., et al. (1997) An informationintensive<br />

approach to the molecular pharmacology of cancer. Science, 275, 343–9<br />

14. Saeed, A.I., Sharov, V., White, J., Li, J., Liang, W., Bhagabati, N., Braisted, J.,<br />

Klapa, M., Currier, T., Thiagarajan, M., Sturn, A., Snuffin, M., Rezantsev, A.,<br />

Popov, D., Ryltsov, A., Kostukovich, E., Borisovsky, I., Liu, Z., Vinsavich, A.,<br />

Trush, V., Quackenbush, J. (2003). TM4: a free, open-source system for microarray<br />

data management and analysis. Biotechniques, 34(2), 374–8.


15<br />

Dissecting Cancer Serum Protein Profiles Using<br />

Antibody Arrays<br />

Marta Sanchez-Carbayo<br />

Summary<br />

Antibody arrays represent one of the high-throughput techniques enabling detection<br />

of multiple proteins simultaneously. One of the main advantages of the technology over<br />

other proteomic approaches resides on that the identities of the measured proteins are<br />

known at front of the experimental design or can be readily characterized, facilitating a<br />

biological interpretation of the obtained results. This chapter overviews the technical issues<br />

of the main antibody array formats as well as various applications using serum specimens<br />

in the context of neoplastic diseases. Clinical applications of antibody arrays vary from<br />

biomarker discovery for diagnosis, prognosis, and drug response to characterization of<br />

s protein pathways and modification changes associated with disease development and<br />

progression. As a high-throughput tool addressing protein levels and post-translational<br />

modifications, it improves the functional characterization of molecular bases for cancer.<br />

Furthermore, the identification and validation of protein expression patterns characteristic<br />

of cancer progression and tumor subtypes may enable tailored therapeutic intervention and<br />

improvement in the clinical management of cancer patients. Technical requirements such as<br />

lower sample volume, antibody concentration, format versatility, and high reproducibility<br />

support their increasing impact in cancer research.<br />

Key Words: antibody arrays; protein profiling; serum; direct labeling.<br />

1. Introduction<br />

1.1. Antibody Arrays in the Context of Other Proteomic Strategies<br />

Two main proteomic strategies can be taken in order to investigate the<br />

cancer proteome, named untargeted and targeted. The terminology refers to<br />

From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />

Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />

263


264 Sanchez-Carbayo<br />

whether the proteins to be measured are unknown and identified along an<br />

untargeted proteomic approach, or known and considered in the experimental<br />

design for targeted strategies. Untargeted architecture platforms are best suited<br />

for first-pass comparisons of proteomes to identify relatively few, novel, or<br />

known proteins that exhibit the greatest differences in abundance. The two<br />

most commonly used technologies are two-dimensional electrophoresis (2D)<br />

and low- and high-resolution mass spectrometry (1,2,3). Targeted architecture<br />

proteomic platforms measure and quantify proteins of interest identified previously,<br />

and are suited for analyses of quantitative differences in abundance<br />

among known protein families and pathways. The versatility of targeted<br />

platforms allows controlling and estimating the reproducibility, scalability, and<br />

precise quantification, leading to high sensitivity and coverage. This approach<br />

allows experimental designs to address specific hypothesis and biological interpretation<br />

of the results obtained. However, the number of proteins amenable<br />

for these analyses depends on the availability of antibodies with high affinity<br />

and specificity to bind a target protein. The main targeted techniques used for<br />

large-scale analysis of many samples and proteins include protein microarrays,<br />

multiplexed Western blots, and tissue arrays. Protein arrays represent the most<br />

versatile among the proteomics techniques available to date, since antigens,<br />

peptides, complex protein solutions, or antibodies can be immobilized to<br />

capture and quantify the presence of specific antibodies or proteins, respectively<br />

(1,2,3,4).<br />

1.2. Antibody Array Formats<br />

Innovation in the immobilization surfaces and detection strategies has led<br />

to an increasing number of planar antibody array technologies and bead-based<br />

versions. Planar antibody arrays represent the most common type of protein<br />

arrays, which is the major focus of the present chapter. This section describes<br />

the main formats of planar arrays covering their differences with bead-based<br />

assays (Fig. 1; for bead-based arrays, see also Chapter 14).<br />

The main planar label-based types comprise one-antibody assays (using<br />

one antibody to capture the target molecule) and sandwich assays (using two<br />

antibodies to capture the target protein) (1,2,3,4). One-antibody and sandwich<br />

assays present advantages and pitfalls over each other. In one-antibody labelbased<br />

assays, the targeted proteins are captured by an immobilized antibody<br />

and detected through labeling with a tag (Fig. 1A). In direct labeling, the<br />

proteins are labeled with a fluorophore, such as cyanines (Cy3 or Cy5). In<br />

indirect labeling, the proteins are labeled with a tag that is later detected by a<br />

labeled antibody. One-antibody label-based assays allow the incubation of two<br />

different samples, each labeled with a different tag on the arrays. Normalization<br />

is facilitated by co-incubating a reference sample with a test sample (1,2,3,4).


Dissecting Cancer Serum Protein Profiles 265<br />

ANTIBODY-BASED ARRAYS<br />

ANTIGEN-BASED ARRAYS<br />

A<br />

Direct<br />

Cy3<br />

Competitive<br />

Cy5<br />

C<br />

Reverse phase<br />

TSA<br />

Cy3<br />

Indirect<br />

Cy5<br />

Complex lysate<br />

Biotin<br />

Digoxigenin<br />

D<br />

Tumor-associated antigen arrays<br />

B<br />

Suspension: bead based<br />

RCA, RLS, ECL<br />

TSA, Bio-SA-Cy3<br />

Whole cell<br />

Membrane<br />

Autoantibody, e.g.: antip53<br />

Tumor antigen e.g.:p53<br />

Soluble<br />

Fig. 1. Main formats of planar and suspension protein arrays. RCA: rolling-circle<br />

amplification; RLS: resonance light scattering; ECL: enhanced chemiluminescence;<br />

TSA: tyramide signal amplification; SA: streptavidin.<br />

Another benefit is that these assays are competitive, since the analytes in the<br />

test and reference solutions compete for binding at the antibodies (1,2,3,4).<br />

This leads to improvement in the linearity of response and dynamic range as<br />

compared to non-competitive assays (4). The main disadvantage is related to<br />

the disruption of analyte–antigen interaction by the label, which may also limit<br />

the detection as well as sensitivity and specificity.<br />

In the sandwich label-based format, antibodies capture unlabeled proteins,<br />

which are detected by another antibody using several methods to generate the<br />

signal for detection (Fig. 1B). The use of two antibodies targeting each analyte<br />

increases the specificity as compared to one-antibody label-based assays. The<br />

reduced background of these assays increases also the sensitivity. The sandwich<br />

format allows only non-competitive assays, since only one sample can be<br />

incubated on each array (1,2,3,4). This results into sigmoidal binding response,<br />

as compared to linear ones in the competitive format, and requires standard<br />

curves of known concentrations of analytes to achieve accurate calibration of<br />

concentrations (4). As compared to one-antibody label-based assays, sandwich<br />

assays are more difficult to develop in a multiplexed manner, since matched<br />

pairs of antibodies and purified antigens may not be available for each target,<br />

and the potential cross-reactivity among detection antibodies increases with<br />

additional analytes (2,4). Currently, the practical size of multiplexed sandwich


266 Sanchez-Carbayo<br />

assays limits to 30–50 different targets (1,2,3,4). This contrasts with oneantibody<br />

assays where only the availability of antibodies and space on the<br />

substrate limits the number of targets being analyzed.<br />

In addition to the planar arrays, suspension or bead-based arrays use<br />

different fluorescent beads, each coated with a different antibody and spectrally<br />

resolvable from each other [(5,6,7,8,9) and see chapter 14]. The beads are<br />

incubated with a sample to allow protein binding to the capture antibodies, and<br />

the mixture is incubated with a cocktail of detection antibodies, each corresponding<br />

to one of the capture antibodies. The detection antibodies are tagged<br />

to allow fluorescent detection. The beads are passed through a flow cytometer<br />

system, and each bead is probed by two lasers, one to read the color or identity<br />

of the beam, and another to read the amount of detection antibody on the<br />

bead (5,6,7,8,9). Multiplexed bead-based flow-cytometry assays represent an<br />

active area of development. Differentially identifiable beads coated with either<br />

proteins, autoantigens, or antibodies can identify a variety of bound antibodies<br />

or proteins using a cytometer system (5,6,7,8,9). Advances in instrumentation<br />

and bead chemistries will probably make this approach very valuable for the<br />

detection of circulating cancer cells in clinical practice. In another version<br />

of this concept, suspensions of cells can be incubated on antibody arrays,<br />

and the amount of cells that bound each antibody can be quantified by dark<br />

field microscopy. These arrays have the potential of characterizing multiple<br />

membrane proteins in specific cell populations or changes in cell surfaces<br />

induced by drug therapies.<br />

It is important to distinguish antibody arrays from two main protein array<br />

formats that can be applied to serum samples based also on the binding of<br />

antibodies to specific antigens. The development and design of tumor-associated<br />

antigen (TAAs) arrays enhance the detection of autoantibodies against TAAs<br />

for cancer diagnosis (Fig. 1C). The rationale is related to the presence in<br />

the cancer sera of antibodies, which react with a unique group of autologous<br />

cellular antigens or TAAs (10,11). Complex protein extracts can also be spotted<br />

onto membranes and probed with antibodies targeting specific proteins on the<br />

so-called reverse-phase arrays (12,13) (Fig. 1D).<br />

1.3. Types of Planar Antibody Arrays Based on the<br />

Labeling-Hybridization Methods<br />

The increasing detection modalities have led to several types and applications<br />

for antibody arrays (see Note 1). A number of labeling and detection<br />

methods can be employed for one-antibody and sandwich label-based planar<br />

arrays (Fig. 2). The signal can be generated by a fluorescently labeled detection<br />

antibody (Fig. 2A). This approach represents the standard sandwich arrays,


Dissecting Cancer Serum Protein Profiles 267<br />

A)<br />

Antibody direct<br />

Sandwich<br />

B) Species-specific<br />

Tertiary Antibody<br />

C) Biotinylated<br />

antibodies with<br />

fluorescent streptavidin<br />

conjugates<br />

D) 2 SAPE layers<br />

B<br />

B<br />

B<br />

E) Tyramide<br />

Signal<br />

Amplification<br />

F) Alkaline<br />

phosphatase linked<br />

to a species tertiary<br />

Ab activated<br />

chemiluminescence<br />

G) Rolling Circle<br />

Amplification<br />

H) Resonance lightscattering<br />

B<br />

B<br />

B<br />

Fig. 2. Several labeling and detection methods can be employed for antibody arrays.<br />

requiring chemical labeling of all secondary detection antibodies, but the<br />

assay is a simple two-step procedure that does not require a separate staining<br />

step (14,15). An alternative approach employs a species-specific fluorescently<br />

labeled tertiary antibody (Fig. 2B). This option avoids the use of large chemically<br />

modified detection antibodies, but limits the species of capture antibodies.<br />

A third option is the utilization of available biotinylated detection antibodies<br />

(Fig. 2C) (15). In these assays, detection occurs after staining of the sandwich<br />

complex with Cy3-labeled streptavidin or other streptavidin variants, such as<br />

Texas Red conjugates or streptavidin-R-Phycoerythrin (SAPE) (15). The fourth<br />

possibility is based on that the fluorescent signal can be further amplified<br />

using a second layer of SAPE coupled to the first layer via an anti-SAPE<br />

antibody (Fig. 2D). Alternatively, in the fifth option, the number of biotin<br />

labels can be increased via thyramide signal amplification (Fig. 2E) (2). An<br />

antibiotin horseradish peroxidase (HRP) will generate a thyramide radical that<br />

cross-links a biotin or a fluorophore to all exposed tyrosine residues of any<br />

protein near the recognition event (2). Chemiluminesce can also be implemented<br />

to multiplexed sandwich assays as a sixth possibility (Fig. 2F), using<br />

a streptavidin-HRP or a species-specific antibody conjugated with HRP or<br />

alkaline phosphatase and chemiluminescence substrates. Chemiluminescence<br />

is typically more sensitive than standard fluorescence applications. A polymer<br />

decorated with streptavidin and europium chelates is utilized not only for


268 Sanchez-Carbayo<br />

microplate but also for microarray measurements. Evanescence waveguide is<br />

employed as an alternative for ultrasensitive fluorescence (16). Rolling-circle<br />

amplification can be applied as a seventh option for signal generation (Fig. 2G).<br />

The 5 ′ end of an oligonucleotide primer is attached to an antibiotin antibody<br />

(17). After binding of the antibiotin antibody to the biotinylated detection<br />

antibody of the sandwich, the oligonucleotide is enzymatically extended using a<br />

circular DNA sequence as template. Fluorescently labeled short oligos are then<br />

hybridized to the extent DNA decorating each bound antibody with thousands<br />

of fluorophores (15). An alternative eighth staining method yielding sensitivity<br />

similar to evanescence wave technology and rolling-circle amplification<br />

involves the use of colloidal gold particles coated with an antibiotin antibody<br />

(18). Because of resonance light scattering (RLS), these particles scatter white<br />

light very intensely, and quantitative readouts of miniaturized sandwich assay<br />

can be obtained with a simple charge-couple device (CCD) camera-based<br />

imaging system (18) (Fig. 2H). RLS particles do not show any photobleaching<br />

as compared to fluorescence or chemiluminescence (14,15,16,17,18,19,20).<br />

Due to the high versatility of labeling-hybridization methods available to<br />

date, the present chapter will describe the detailed reagents and protocol of<br />

direct labeling on serum specimens, as summarized in Figure 3.<br />

1.4. Applications in Cancer Research Using Serum Specimens<br />

Direct labeling methods have been applied for cancer diagnostics to the<br />

detection of proteins in the serum of patients with prostate cancer (21). The<br />

use of a two-color rolling-circle amplification method improves the detection<br />

of low abundant proteins. This method has also been shown to provide<br />

adequate reproducibility and accuracy for protein profiling on serum specimens<br />

and clinical applications (17,22,23,24). Sandwich assays can also measure<br />

protein abundances in body fluids using detection methods such as RLS (25),<br />

enhanced chemiluminescence (26), tyramide signal amplification (27), and<br />

fluorescence (28).<br />

Reverse protein arrays have also been optimized to spot serum specimens<br />

and obtain high-throughput measurement of IgA in thousands of sera using<br />

a single experiment (29). For example, a recent report designed antibody<br />

arrays for bladder cancer by selecting antibodies against targets differentially<br />

expressed in bladder tumors identified by gene profiling (24). Serum<br />

protein profiles obtained by two independent antibody arrays represent comprehensive<br />

means for bladder cancer diagnosis and clinical outcome stratification<br />

(24). Validation analyses with ELISA and immunohistochemistry on<br />

tissue microarrays represent alternative approaches to confirm the relevance of<br />

identified proteins for tumor progression. Such strategy provides experimental


Dissecting Cancer Serum Protein Profiles 269<br />

evidence for the use of several integrated technologies and strengthens the<br />

process of biomarker discovery.<br />

Serum specimens can be utilized to profile the humoral immune signature of<br />

cancer patients to detect both autoantibodies against tumor antigens and secreted<br />

cytokines. The combined detection of antibodies against a group of TAAs has<br />

provided high sensitivity for diagnosis of prostate cancer (10). The use of phage<br />

display arrays can enhance tumor subtype specificity of such measurements<br />

(10,11). Cytokine profiling on serum and plasma specimens can differentiate<br />

cancer patients from control subjects, and also stratifies patients with leukemia<br />

based on clinical outcome. Several reports have also compared the reproducibility<br />

and differences among several technologies available for multiplexing cytokine<br />

measurements, including planar and bead-based antibody arrays (5,6,7).<br />

In summary, antibody arrays can be utilized for the following applications:<br />

(1) the discovery of candidate disease biomarkers (21,24); (2) characterizing<br />

signaling pathways (28), disease progression, clinical subtypes, and<br />

outcomes (21,24); (3) measurement of changes in post-translational modifications<br />

or expression levels of disease-related proteins (28); (4) identifying<br />

binding partners to proteins; this is very important especially when conducting<br />

functional studies for drug discovery; (5) epitope mapping for determining<br />

regions of proteins than bind specific antibodies.<br />

2. Materials<br />

2.1. Printing of Antibody Arrays<br />

1. Antibodies. A critical step is the selection of the antibodies to be printed onto the<br />

antibody arrays. The antibodies printed on the arrays will be selected based on<br />

their known affinity characterization and experimental design (see Note 2).<br />

2. Antibody purification with Affi-gel Protein A MASP II kit (Bio-Rad, Hercules,<br />

CA).<br />

3. Protein concentration measurements with BCA Protein Assay (Pierce, Rockford,<br />

IL).<br />

4. Fast Slides (Schleicher and Schuell Biosciences, Keene, NH) or HydroGel coated<br />

glass microscope slides (Perkin Elmer Life Sciences, Waltham, MA).<br />

5. Polypropylene 384-well microtiter plates (Genetix, New Milton, Hampshire, UK<br />

or MJ Research, Waltham, MA).<br />

6. Seal aluminum scotch brand foil tape (R.S. Hugues Sunnyvale, CA).<br />

7. Printer.<br />

2.2. Labeling and Hybridization of Serum Samples<br />

1. NHS-linked Cy3 and Cy5 protein labeling agents (Amersham, GE Healthcare,<br />

Piscataway, NJ).


270 Sanchez-Carbayo<br />

2. Microscopic slide staining chamber with slide racks (Shandon Lipshaw, Pittsburgh,<br />

PA).<br />

3. Diamond scribe (VWR, West Chester, PA).<br />

4. Hydrophobic marker (PAP pen, Immunotech, Marseille).<br />

5. Coverslips (Lifterslip, Erie Scientific, Portsmouth, NJ).<br />

6. Wafer handling tweezers (Technitool, West Berlin, NJ).<br />

7. Clinical centrifuge with flat swinging buckets for holding slide racks.<br />

8. Spin columns for protein cleanup (Bio-Rad Micro Bio-Spin P-6).<br />

9. Microcon YM-50 (Millipore, Bedford, MA).<br />

10. Complete protease inhibitors (Roche, Indianapolis, IN).<br />

11. Buffers: phosphate buffered saline (PBS), pH 7.4 (137 mM NaCl, 4.3 mM<br />

Na 2 HPO 4 , 1.4 mM KH 2 PO 4 ); carbonate buffer, pH 8.5 (50 mM NaHCO 3 );<br />

PBST, PBS containing 0.5% (v/v) Tween-20; 0.1 M PBS, pH 7.2 (68.4 ml<br />

1MNa 2 HPO 4 , 31.6 ml 1 M NaH 2 PO 4 , 900 ml dH 2 O); NP40 lysis buffer:<br />

50 mM Hepes-OH, EDTA, 50 mM NaCl, 10 mM NaPPi (Tetrasodium Diphosphate<br />

Decahydrate), 50 mM NaF, 1% (v/v) NP40, 10 mm Sodium- Vanadate,<br />

pH 7.5–8.0; saturated NaCl (Sigma); blocking buffer: 1% (w/v) bovine serum<br />

albumin (BSA) in PBST; 7–10 mM dye stock in DMSO: Dissolve one tube of<br />

Cy3 or Cy5 dyes in 30 μl of DMSO. Aliquot and freeze at –80°C.<br />

2.3. Detection<br />

1. ScanArray microarray scanner at 543 nm and 633 nm wavelengths (Packard<br />

Bioscience, Research Parkway Meriden, CT).<br />

2. GenePix Pro 3.0 (Axon Instruments, Union City, CA) software program employed<br />

to quantify the image data.<br />

3. Methods<br />

Three main steps can be considered along the overall process of setting<br />

up custom-made antibody arrays: antibody array construction, sample labeling<br />

and hybridization onto the antibody array, and scanning and data analysis. The<br />

success of the whole process is greatly dependent on the availability of highquality<br />

antibodies for capturing the target proteins as well as serum samples<br />

well handled, preserved, and characterized.<br />

3.1. Antibody Array Construction<br />

1. Select the antibodies (see Note 2).<br />

2. Purify the antibodies (see Note 3).<br />

3. Keep stable and quantify the antibodies (see Notes 4–7).<br />

4. Prepare the printing plate with antibodies. Put 5– 7 μl antibody solution on each<br />

well of a 384-well plate (see Note 8).<br />

5. Prepare slides for printing (see Note 9).


Dissecting Cancer Serum Protein Profiles 271<br />

For nitrocellulose slides, no preparation is needed (see Note 9).<br />

For hydrogel slides: The hydrogel slides should be prepared just before use<br />

(i.e., only when you are ready to print the arrays). Load the hydrogels into a<br />

slide rack, briefly rinse (1 s) in purified water, and wash three times at room<br />

temperature with gentle rocking for 10 min each time in purified water. A<br />

microscope slide staining chamber is useful for the washing steps. The staining<br />

chambers come with slide racks that hold 10–30 slides. The racks can be<br />

transferred between staining chambers containing different washing buffers as<br />

well as a clinical centrifuge for drying the slides.<br />

6. Centrifuge slides to dry at no more than 350 g for 3 min. A clinical centrifuge<br />

with flat swinging bucket holders works well for this task. Place a paper towel<br />

layer on the bottom of the swinging bucket to absorb water removed from the<br />

slides. Place the slide rack on the paper towel and centrifuge at no more than<br />

350 g for about 3 min.<br />

7. Place the hydrogel slides in a 40°C water bath for 20 min using the staining<br />

chamber allocating paper towel in the bottom.<br />

8. Remove the slides from the incubator and allow slides to cool at room temperature<br />

for 5 min. The slides are now ready for printing.<br />

9. Print the antibodies on the slides (see Note 10).<br />

10. Start the post-print processing of microarrays.<br />

For hydrogels:<br />

• Prepare staining chambers with a wet paper towel soaked in saturated NaCl at the<br />

bottom.<br />

• After printing, the slides are incubated in a humidified staining chamber overnight<br />

at room temperature to allow adsorption of the antibodies to the matrix.<br />

• The next day, circumscribe the array boundaries on each slide with a marker (e.g.,<br />

PAPpen). Leave at least 3–4 mm between the array and the marker line. Allow the<br />

hydrophobic marker lines to fully dry.<br />

For nitrocellulose (FAST, Schleier, and Schuell) slides:<br />

• Allow the slides to dry for at least 1 h (let the slides dry on a slide-staining chamber).<br />

• Store in a refrigerator on a slide rack in a humidified staining chamber.<br />

• The next day, circumscribe the array boundaries on each slide with a marker (e.g.,<br />

PAPpen). Leave at least 3–4 mm between the array and the marker line. Allow the<br />

hydrophobic marker lines to fully dry.<br />

11. Rinse the slides as follows:<br />

a. Rinse briefly (for 30 s) in PBST.<br />

b. Wash in PBST for 3 min with gentle rocking.<br />

c. Wash in PBST for 30 min with gentle rocking.


272 Sanchez-Carbayo<br />

Cy5<br />

Ligand + Test proteins Cy3 Ligand +<br />

Reference<br />

proteins<br />

Separate free dye<br />

React<br />

Mix<br />

Place on array<br />

React<br />

Separate free dye<br />

Free dye<br />

Coated slide<br />

Antibodies<br />

Free dye<br />

Scan<br />

Fig. 3. Scheme of the whole process when working with custom-made antibody<br />

arrays. Once antibodies are selected and printed on the arrays, serum samples are labeled<br />

and hybridized onto the antibody arrays. Scanning and data analyses of fluorescence<br />

will provide quantitative measurement of multiple proteins simultaneously.<br />

12. Block the slides. Once the antibodies are immobilized, it is necessary to block<br />

non-specific protein-binding sites on the printed microarrays. Typical blocking<br />

solutions include diluted BSA or casein solutions (1,2,9,12,19). If the arrays are<br />

not to be used for a day or more, leave them in the BSA-blocking solution in<br />

the refrigerator. Prepare the blocking buffer right before use. Add sodium azide<br />

to the blocking buffer if you intend to store for more than one day and then<br />

begin with step b shown below:<br />

a. Block in the blocking buffer for 1hatroom temperature with constant shaking.<br />

b. Briefly rinse with PBST twice or alternatively rinse the second time with 0.1 M<br />

PBS, pH 7.2, for 20 min.<br />

c. Dry the slides by centrifugation immediately prior to incubating with the labeled<br />

samples using a clinical centrifuge with flat swinging bucket holders.<br />

3.2. Labeling of Samples and Hybridization<br />

A protocol for direct labeling is provided, summerized in Figure 3.<br />

1. Select the serum samples for labeling (see Note 11).<br />

2. Determine the volume of each serum sample to label in both Cy3 and Cy5. It is<br />

important to note that Cy3 is more consistent and bright when deciding whether<br />

to label samples or references with either Cy3 or Cy5. For the samples, divide<br />

the volume to be placed on the array by the desired final dilution of the sample<br />

(varying from 1/30 to 1/50). For a 20 μl volume (the volume used for a 12 ×<br />

12-mm standard hydrogel) and a 1/50 final dilution, use 0.4 μl of serum sample<br />

(20/50) per array.


Dissecting Cancer Serum Protein Profiles 273<br />

If a pooled reference is to be used, each component of the reference is first<br />

labeled and then pooled (as opposed to pooling and then labeling). The amount<br />

to be labeled of each component of the reference is (Va × A)/Nr, where Va<br />

is the volume per array (0.4 μl in the above case), A is the number of arrays<br />

the reference will be used in, and Nr is the number of samples pooled in the<br />

reference. For example, if a pool of 10 samples will be used as the reference for<br />

20 arrays, the volume of each sample to be used in the Cy5 labeling mix will be<br />

(0.4 × 20)/10 = 0.8 μl.<br />

3. Dilute the serum sample approximately 15× with carbonate buffer or phosphate<br />

buffer at pH 7.5 spiked with 0.5 μg/ml dinitrophenol (DNP) flag (if the flag is<br />

to be used for normalization). Do not use buffers with an amine group such as<br />

Tris-base.<br />

4. Add a 20th volume of dye stock to each sample. The final concentration of the<br />

NH-ester activated Cy-dyes within the serum protein solution should be between<br />

100–300 μM (each vial of dye contains 200 nmol).<br />

5. Mix each dye and serum protein solutions and let the reaction proceed on ice in<br />

the dark for 2 h. Normally, mix the reference protein solution with the Cy3 dye<br />

solution, and the test protein solution with the Cy5 dye solution.<br />

6. Add a 20th volume 1 M Tris-HCl pH 7.5–8.0 (or glycine) to each of the reactions<br />

to quench (stop the labeling), so that at least a 200-fold excess of quencher:dye<br />

concentration is achieved.<br />

7. Load the samples onto a microconcentrator having the appropriate molecular<br />

cutoff, such as the Bio-Rad Bio-spin 6 microcolumn, and spin at 1000×g for<br />

2 min. A 3000-D cutoff captures most proteins while still removing the dye.<br />

If smaller proteins are not important, the 10,000-D cutoff is faster. Centrifuge<br />

according to the microconcentrator instructions. The 10,000-D microcon typically<br />

requires 20 min, and the 3000-D microcon requires 80 min of centrifugation at<br />

10,000×g at room temperature.<br />

8. Make 10× blocking solution: 30% (w/v) non-fat milk in PBS and 1% (v/v)<br />

Tween-20 (e.g., 3 ml milk in 10 ml buffer).<br />

9. Spin the milk solution at 10,000×g for 10 min. The milk blocker solution needs<br />

to be centrifuged to remove particulate matter (e.g., 10 min at 10,000×g).<br />

10. After centrifuging with the microconcentrator column to the flow-through<br />

(collection tube) of the column, add 1 μl of the supernatant of the blocking mix<br />

per array and 1 μl of 10× protease inhibitor per array.<br />

11. Pool the reference samples and divide among the test samples according to the<br />

experimental plan.<br />

12. Add 1× PBS to bring to 20–25 μ per array, if necessary. The labeled samples<br />

may be stored overnight at 4 C.<br />

13. Start hybridization of the labeled serum samples on the printed antibody<br />

arrays. Distribute the Cy3-labeled reference protein solution to the appropriate<br />

Cy5-labeled test protein solutions. Add PBS to each mix to achieve a volume<br />

of 20–25 μL per array. It is recommended to remove any particulate matter or


274 Sanchez-Carbayo<br />

precipitate by (1) filtering with a 0.45-μm spin filter, or (2) centrifuging for 10 min<br />

at 14,000×g and pipetting out the supernatant.<br />

14. Load appropriate amount of labeled samples on the slides within the marked<br />

boundaries, and cover with Lifterslip. Use 20 μl for the 12 × 12 -mm hydrogels.<br />

The cover slip should be at least 1/4 inch longer than the dimensions of the array.<br />

(The background is often higher at the edges of the cover slip.)<br />

15. Incubate for 2hatroom temperature with constant shaking.<br />

16. Rinse briefly in PBST to remove the Lifterslip.<br />

17. Wash three more times for 10 min in fresh changes of PBST. (All washes are<br />

performed in racks at room temperature.)<br />

18. Rinse for 20 s in PBS. Alternatively, final washes with H 2 O can be performed<br />

for 5 min each of gentle agitation.<br />

19. Dry the slides by centrifugation prior to scanning.<br />

3.3. Scanning and Data Analysis<br />

1. Scan the slides at 552 nm and 635 nm using a microarray fluorescence scanner<br />

(see Note 12).<br />

2. Process the data: grid the arrays and reject unsatisfactory data points (see Note<br />

13).<br />

3. Normalize the data (see Note 14).<br />

4. Analyze the data (see Note 15).<br />

5. Interpret the data (see Note 16).<br />

4. Notes<br />

1. Radioactivity, fluorescence, or chemiluminescence detection methods have been<br />

used with antibody arrays. Radioactivity is not frequently used due to its<br />

safety concerns and its longer exposure times (up to 10 h). Fluorescence<br />

is one of the most frequently utilized detection methods. Fluorophores, like<br />

chromogens, exist in many formulations and have defined emission spectra.<br />

Fluorescein, rhodamine (Texas Red), phycobiliproteins, nitrobenzoxadiazole<br />

(NBD), acridines, Cy3, Cy5, and bodipy compounds are commonly used<br />

for protein labeling (13,14,15,16,17). The selection of fluorophores for use<br />

with microarrays depends on sample type, substratum, emission characteristics,<br />

and even the number of analytes to be assayed. Not all substrates are<br />

compatible with fluorescent detection strategies due to inherent autofluorescence<br />

of the material (14,15,16,17), which significantly reduces the signal-to-noise<br />

ratios. Nitrocellulose-coated slides cause light scatter and higher background<br />

as compared to aldehyde-treated slides with laser scanner detection methods,<br />

limiting the use of nitrocellulose substrata for fluorescent detection methods<br />

(13,14,15,16,17). The sample may also have components that interfere with a<br />

selected fluorophore. Flavoproteins autofluoresce and emit light in the same<br />

region as fluorescein, limiting the use of this fluorophore in samples rich in<br />

flavoproteins, e.g., liver and kidney tissues. Photobleaching and quenching of


Dissecting Cancer Serum Protein Profiles 275<br />

fluorophores can decrease the total signal observed on an array. The Cy3 and<br />

Cy5 dyes are commonly used for fluorescent detection because they overcome<br />

these effects. They are well suited for fluorescence detection strategies due to<br />

their decreased dye interactions, increased brightness, and the ability to add<br />

charged groups to the molecules (13,14,15,16,17). Fluorescent-tagged proteins<br />

including antibodies can be used for detection of immobilized molecules on<br />

a microarray using both indirect or sandwich strategies. Streptavidin-biotin or<br />

RCA amplification chemistries can also be applied to fluorescence detection<br />

strategies (22,23,24), providing sufficient sensitivity for most applications.<br />

Chemiluminescent detection methods are based on Western blotting protocols<br />

for detection of antigen-bound antibodies with secondary antibodies conjugated<br />

to alkaline phosphatase or HRP (13,14,15,16,17,18). Chemiluminescent<br />

detection methods can be applied to any of the label detection methods. Chemiluminesce<br />

is highly sensitive but may pose limitations due to its dynamic range<br />

and compatibility with multiplexing. Amplification strategies such as biotinyltyramide<br />

can be applied to chemiluminesce. A useful application consists of<br />

total protein determination made directly on arrays using a ruthenium organic<br />

complex, which interacts non-covalently with proteins immobilized on nitrocellulose<br />

(13,14,15,16,17,18). The dye is applicable to arrays printed on nitrocellulose<br />

membranes. This type of total protein analysis is useful for minute sample<br />

volumes in which a standard protein spectrophotometric analysis would not be<br />

feasible.<br />

2. Antibody selection. The first critical step is the selection of protein targets to be<br />

measured with the antibody arrays, which depends on the experimental design<br />

and objectives of the analyses undertaken. It is advisable to have biological or<br />

experimental criteria supporting the search for specific proteins in the serum. An<br />

approach rendering high efficacy suggests analyses of high-throughput profiling<br />

at the DNA or RNA level previous to protein profiling to enrich the probability<br />

to find a target protein in the serum. Not all proteins are suitable for measurement<br />

with this assay, since their size and the likely abundances of the proteins in the<br />

samples are limiting factors. If a protein is very small (or is a polypeptide), it<br />

may not be compatible with direct labeling detection methods, which use sizebased<br />

separation of labeled product from the label. If a protein is in very low<br />

abundance, it may fall out of the detection limit of the assay. Detection limits for<br />

the assay depend on the antibody used, the protein background in the sample,<br />

and the detection conditions. In general, the direct labeling method described<br />

here can give detection limits in the low ng/ml range for targets present in the<br />

serum background.<br />

Once the target protein is assembled, the search of antibodies begins. The<br />

main bottleneck to the development of highly multiplexed planar antibody<br />

arrays is the requirement for specific affinity ligands for each analyte. Commercially<br />

available antibodies against novel or rare proteins may not exist, which<br />

leaves the option of having the antibody custom-produced. Custom antibody<br />

generation is lengthy, expensive, and probably not a viable choice for more<br />

than a few antibodies. If a protein target is more common and a choice of


276 Sanchez-Carbayo<br />

antibody exists, it is advisable to search for antibodies that work efficiently for<br />

enzyme-immunoassays, since these assays are quite similar to antibody arrays.<br />

Monoclonal antibodies seem to have a higher success rate, but polyclonals may<br />

also work well, although they may lead to high background and reduced specificity<br />

and sensitivity as compared to monoclonal antibodies. In vitro selection<br />

of antibodies using phage-ribosome or mRNA display technologies, and the<br />

use of engineered binding molecules is having increasingly important role<br />

in generating specific affinity ligands for analytes for which antibodies are<br />

unavailable (14). An alternative strategy to produce specific antibodies has been<br />

validated optimizing the design of protein sub-fragments of a selected size with<br />

minimal sequence similarity to other proteins. The fragments are selected using<br />

an alignment scanning procedure based on the principle of lowest sequence<br />

similarity to other human proteins, optimally to generate antibodies with high<br />

selectivity (20). If direct labeling method is to be used, only one antibody for<br />

target is needed. If using a sandwich assay, a matched pair of antibodies is<br />

needed. The direct labeling method works well for mid- to high-abundance<br />

proteins, while sandwich assays or amplification protocols are recommended for<br />

low-abundance proteins.<br />

Since antibodies cannot be manufactured with known affinity and specificity,<br />

it is advisable to validate the specificity and sensitivity of each antibody<br />

prior to use as a probe for protein arrays. The identification of a single band<br />

at the specified molecular weight on Western blotting represents a standard<br />

validation strategy for the specificity and sensitivity of the proposed antibody, as<br />

well as immunoprecipitation followed by mass spectrometry (1,6). The antigenantibody<br />

properties of the antibodies printed on the arrays can be evaluated<br />

by the estimation of random and systematic errors. Western blotting analyses<br />

can serve to evaluate the specificity of the antibodies. Commercial or custommade<br />

enzyme-immunoassays can be utilized to validate the ability of antibodies<br />

identified by antibody arrays by an independent method on the same serum<br />

specimens profiled using antibody arrays.<br />

Recombinant antigens can be utilized as positive and negative controls for<br />

the process of printing (depositing the antibodies onto the slides), calibration,<br />

and detection methods (1,2,9). The linearity range of the assay depends on the<br />

antibody-antigen affinity. Linearity can only be achieved when the concentration<br />

of the analyte and antibody are matched to the affinity constant. It is advisable<br />

that dilution and recovery experiments evaluating the specificity and affinity of<br />

the antibodies for their ligands are included when utilizing antibody arrays. (2,9).<br />

3. Purity of antibodies. Antibodies work best in the arrays when they are highly<br />

purified. The use of antibodies in a high background of other proteins often<br />

results in a weakened or non-specific signal, since the background proteins<br />

occupy many binding sites on the microarray. Some purified antibodies come in a<br />

BSA or gelatin stabilizer. It may be desirable to remove gelatin, since it can bind<br />

some biological molecules. BSA rarely has the problem of non-specific binding,<br />

but if it is at a much higher concentration than the antibody, it could significantly


Dissecting Cancer Serum Protein Profiles 277<br />

reduce the signal from the antibody, which would warrant further purification of<br />

the antibody. Some antibodies come in a high concentration (8–50%) of glycerol<br />

to improve stability. While glycerol will not interfere with the assay, the added<br />

viscosity may negatively affect the printing process. Glycerol concentrations<br />

above 20% should be avoided. To change the buffer of an antibody, it is advisable<br />

to use the Bio-Rad Micro Bio-Spin P30 column. These columns come with<br />

two types of buffers: sodium saline citrate (SCC) and Tris buffer. The filtrate<br />

will come through in the packing buffer. This packing buffer can be changed<br />

by running a different buffer through the column three times. The P30 column<br />

removes solution components smaller than 30 kD, and the P6 column removes<br />

components smaller than 6 kD. Thus, the P30 column is better for purification of<br />

antibodies, and the P6 column is better for purification of complex mixtures in<br />

which low-molecular-weight species should be preserved. Thus, if the antibody<br />

is to be subsequently labeled, it is recommended not to put the antibody in a<br />

Tris or amine-containing buffer.<br />

Polyclonal antibodies come either as unpurified antisera, the IgG fraction of<br />

antisera, or the affinity purified (purified using the antigen) fraction of antisera.<br />

Affinity purified is best, since it yields the highest purity of specific antibody.<br />

IgG-purified fractions of antisera usually work well. Antibodies that arrive in<br />

pure ascites fluid may also need to be purified. If a monoclonal antibody is good,<br />

it will work well without further purification, and so they should be tested first.<br />

A protein purification method of IgG antibodies is recommended using the Affigel<br />

Protein A MAPS II kit (Bio-Rad). In general, the following antibody buffer<br />

requirements should be considered: (1) all antibodies that arrive as antisera need<br />

to be IgG purified; (2) antibodies in ascites fluid may also need to be purified,<br />

although they can first be tested without purification.<br />

4. Stability and concentration. Antibodies are stable when refrigerated in a standard<br />

buffer such as PBS. The concentration of an antibody can be measured using<br />

a protein concentration kit such as the BCA 200 Protein Assay Kit (Pierce<br />

Biotechnology). The optimal spotting concentration range is 100–200 μg/mL.<br />

Higher concentrations could yield better signal strengths and lower detection<br />

limits, and may be desirable if the consumption of antibody is not a concern.<br />

Each antibody’s concentration should be constant at different printing sets, since<br />

concentration variations in an antibody can affect data. Simply stated, if a<br />

set of data is produced using a particular antibody at 300 μg/mL, subsequent<br />

experiments should use that antibody at 300 μg/ml for better comparison of<br />

the results.<br />

5. Antibody storage. Most antibodies can be stored or refrigerated for up to a year.<br />

New antibodies should be divided into aliquots that will last approximately a<br />

year each. One aliquot should be kept in the refrigerator as a working stock,<br />

and the others frozen at –70°C. Aliquoting the antibody stocks helps to avoid<br />

repeated freeze/thawing that can damage the proteins. Protein stocks should not<br />

be frozen in PBS; it is better undiluted. When retrieving antibodies/proteins from


278 Sanchez-Carbayo<br />

a freezer stock, thawing should be done slowly on ice to reduce damage to the<br />

antibody from the thawing process.<br />

6. Tracking antibodies. It is helpful to keep information about the antibodies in<br />

a database. It is advisable to provide a number code for each antibody, and if<br />

changes are made to an antibody’s buffer composition, a new code should be<br />

assigned to the new preparation. Relevant information to track include clonality,<br />

manufacturer, animal of origin, concentration, and aliquot age. It is important<br />

to track the maximum information provided in the antibody datasheet, and label<br />

aliquots accordingly.<br />

7. Maintaining antibody stocks. A refrigerator stock of ready-to-use antibodies<br />

(kept at working solution) should be maintained. Except for the antibodies that<br />

should not be frozen, only one tube of each antibody should be stored in the<br />

refrigerator at a time. The amount of each antibody in the refrigerator stock<br />

should be sufficient to last for six months or up to a year (normally around<br />

100 μL). The rest of the antibody stock should be aliquoted into similar volumes<br />

and frozen at –80°C. If the antibody in the refrigerator stock needs to be diluted<br />

in order to reach the working stock concentration, dilute only sufficient stock for<br />

the working solution. When retrieving antibodies/proteins from a freezer stock,<br />

they should be thawn slowly on ice in order to reduce damage from the thawing<br />

process. The protein stock master list will need to be adjusted to indicate when<br />

the antibodies are thawn and frozen.<br />

8. Print plate preparation. After the antibodies have been acquired and prepared<br />

at proper purity and concentration, they are assembled into a “print plate,”<br />

which is a microtiter plate used in the robotic printing of microarrays.<br />

Polypropylene microtiter plates are preferable to polystyrene because of lower<br />

protein adsorption. The plate should be rigid and precisely machined for<br />

optimal functioning with printing robots. The 384-well plates are generally more<br />

compatible with printing robots than 96-well plates and require less volume per<br />

well than 96-well plates. Load about 6–10 μl of each antibody into each well<br />

of the 384-well print plate. The volume may depend on the shape of the well<br />

and how far the print tips descend into the well. Too much volume may lead to<br />

droplets of antibody solution sticking to the outside of the print tip. The volume<br />

may also need to be optimized for particular applications, such as multiple<br />

draws from each well, which would require a greater volume. If printing is<br />

sometimes inconsistent or variable between printing tips, it is desirable to fill<br />

multiple wells with the same antibody solution so that different print pins spot<br />

the same antibody. Store the 384-well print plates sealed in the refrigerator until<br />

ready to use. Aluminum foil tape provides a good seal. Enclosing the covered<br />

plate in a sealed plastic bag ensures long-term, evaporation-free storage. It is<br />

very important to prepare a spreadsheet containing the well identities for use in<br />

downstream data processing applications.<br />

9. Selection of slides. The various immobilization and detection strategies are<br />

devised depending on which target molecules are going to be measured and<br />

which ones are used to capture them. The attributes of an ideal sub-stratum


Dissecting Cancer Serum Protein Profiles 279<br />

for antibody arrays include limited non-specific binding, high surface area-tovolume<br />

ratio, inert biological molecules, minimal autofluorescence, and compatibility<br />

with available detection methods. A variety of surfaces and immobilization<br />

chemistries have been described for antibody arrays. Derivatized supports where<br />

capture antibodies are immobilized include surfaces such as polyvinylidene<br />

difluoride, nitrocellulose, agarose, polyacrylamide, or hydrogels. Glass slides<br />

are frequently coated with one-, two-, or three-dimensionally structured surface<br />

modifications, being activated with aldehyde, polylysine, or a homo-functional<br />

cross-linker as part of the initial optimization experiments (2,9,14). The advantages<br />

of the use of distinct coating or surfaces under different blocking, pH<br />

buffering, or UV cross-linking conditions for specific applications have been<br />

described (14). Silane-coated glass slides or acrylamide hydrogel can provide<br />

good reproducibility from day to day, efficient immobilization of antibodies, and<br />

low background when used in conjunction with fluorescence detection. Various<br />

substrates for antibody arrays have been reported, such as poly-lysine coated<br />

glass (1), aldehyde-coated glass (30), nitrocellulose (31), and a poly-acrylamide<br />

based hydrogel (32). Hydrogels and nitrocellulose give good results for the direct<br />

labeling method described here. Nitrocellulose slides do not require any preparation<br />

before printing, and give clean and low background results. Hydrogel<br />

coating on glass slides (such as those supported by PerkinElmer Life Sciences)<br />

can support multiple layers of protein, thus increasing the binding capacity<br />

and signal strengths, and it should be noted that the hydrophilic matrix of the<br />

hydrogel may better retain native protein structure. Hydrogels should be stored<br />

dry at room temperature. They must be used within 2 days after preparation.<br />

10. Printing of antibody arrays. The details of printing will depend on the printing<br />

robot used. It is necessary to immobilize antibodies in a way that the functional<br />

component will be efficiently deposited without interfering subsequent binding.<br />

Conditions such as humidity, temperature, dust levels, and pin washing should<br />

also be stringently controlled during the printing step. It is important to minimize<br />

the time taken to unseal the print plates and their exposure in order to keep<br />

the evaporation of antibody solutions low. Maintaining a moderately high<br />

humidity in the printing environment (around 45%) will minimize evaporation<br />

and maintain spot quality. Excessive humidity can lead to overly large spots.<br />

The proper printing of the robot should be confirmed with test prints on dummy<br />

slides before starting the microarray production. It is advisable to use 500 μg/mL<br />

BSA in 1× PBS for the test prints. If the tips are washed in a wash bath, make<br />

sure the water is changed regularly every 6–12 loads to prevent contamination<br />

of the tips. It is also desirable to confirm sufficient washing of the pins and lack<br />

of carry-over from load to load. This test can be done by loading labeled protein<br />

into one of the print plate wells in a dummy print, followed by scanning of the<br />

unwashed slide. If fluorescence is seen in spots after the fluorescently labeled<br />

material, the pins need to be washed more stringently. Most microarrayers will<br />

allow the printing of replicate spots on each array from the same well of the print<br />

plate. Replicate spots are useful to obtain more precise data through averaging


280 Sanchez-Carbayo<br />

and ensure the acquisition of data if a portion of the array is somehow unusable.<br />

Six to ten spots per array per antibody are recommended.<br />

11. Serum sample handling and storage. Sera should be collected in red gel tubes,<br />

allowing the coagule to retrieve and centrifuged at 3000 g/10 min, aliquoted and<br />

stored at –80 C. All samples should be consecutively numbered to avoid any<br />

record compromising the identity of these patients or controls under study. Serum<br />

samples should be handled as biohazards. Tips and tubes that contact serum<br />

samples should be disposed in a biohazard bag. Upon the first thaw, the samples<br />

need to be aliquoted. Samples should be aliquoted so that no more than four<br />

thaws are necessary for every experiment. Low volume aliquots (approximately<br />

10–15 μl) of each specimen are recommended. For greater than approximately<br />

50 samples, it is convenient to use a microtiter plate for aliquoting. In this case,<br />

approximately 50 μl from each sample is placed into each well of a 96-well<br />

microtiter plate. Either a robot or a matrix multichannel pipettor is used to<br />

aliquot small volumes into replicate 96-well plates.<br />

12. Scanning. The fluorescence signal from the microarrays is detected using a<br />

microarray scanner. GenePix Pro 3.0 (Axon Instruments) software program<br />

quantifies the image data. The local background in each color channel is<br />

subtracted from the signal at each antibody spot, and spots having obvious<br />

defects, no detectable signal by GenePix, or a low net fluorescence in either color<br />

channel are removed from analysis. The ratio of net signal from the samplespecific<br />

channel to the net signal from the reference-specific channel is calculated<br />

for each antibody spot, and ratios from replicate antibody measurements in each<br />

array are averaged. An intensity-dependent normalization algorithm for antibody<br />

arrays is recommended.<br />

Some of the particulars of the scanning method will depend on the instrument,<br />

but some general principles may be followed. Scanning of an experiment set<br />

should be performed immediately after incubation of the microarrays and all on<br />

the same day, if possible, to minimize noise introduced by variable breakdown of<br />

dye on the array (particularly Cy5). The microarrays should be kept in the dark<br />

to minimize bleaching of fluorescent dyes. Scanners typically have adjustments<br />

for laser power, detector gain, and scan rate. Set both lasers to about 95% and<br />

adjust the scanner to achieve the desired signal intensities. Adjust the laser power<br />

so that at least 50% of the pixels of each spot are saturated. The laser power<br />

should almost always be set very close to the maximum since the maximum<br />

powers of the small commercial scanners are still less than optimal. Lower scan<br />

rates will generally produce higher signal-to-noise ratios. Scanning is performed<br />

at either 50 or 25% speed, depending on practical time limitations. The scan<br />

rate usually has a practical time limit to scan large sets of arrays. In order to<br />

find the optimal scanner settings, it is advisable to set the laser power close to<br />

maximum, set the scan rate to the lowest acceptable value, and then adjust the<br />

detector gain as high as possible without showing signal saturation in the data.<br />

When scanning a large set of arrays as part of a single experiment set, it is<br />

desirable to use similar settings for all the arrays to minimize the differences


Dissecting Cancer Serum Protein Profiles 281<br />

in conditions between the arrays. It may not always be possible to use the<br />

same settings for every slide due to great variations in signal and background<br />

strengths, but subsequent normalization should readjust the data accordingly.<br />

Scanned images are typically stored as tiff files to be analyzed by microarray<br />

analysis programs. It is advisable to save the scanned images by their slide<br />

number followed by either Cy3 or Cy5 and the date of scanning.<br />

13. Gridding and rejection of data points. The analysis of scanned microarray<br />

data depends somewhat on whether the experiment is one color or a twocolor<br />

direct-labeling experiment. In all experiment types, the image data first<br />

need to be converted into numbers. Various software programs that come with<br />

current scanners, such as GenePix with Axon scanners and ArrayQuant with<br />

PerkinElmer scanners, accomplish this. The details for using such programs are<br />

not discussed here, but the principles that these programs use are mentioned.<br />

The quantification of microarray data begins with loading the scanned images<br />

(usually in tiff format) into an analysis program and overlaying a grid that defines<br />

the locations of the antibody spots. After aligning the grid to the image data,<br />

the program calculates the intensities and various statistics for image areas both<br />

within and without the spots. The user can “flag” or reject spots if obvious gross<br />

defects are present. Spots with very low intensity in one or both of the color<br />

channels yield unreliable data and should be rejected. It is especially important<br />

to reject low-intensity spots in two-color ratio since the noisy low intensity<br />

data can greatly affect the ratio. It is desirable to define statistical criteria for<br />

rejecting low-intensity spots rather than relying on user judgments. A threshold<br />

based on the overall variation in background on the arrays can be defined. The<br />

median signal intensity at each spot should be three standard deviations (of the<br />

background areas) above the local background median intensity. This objective<br />

criterion provides uniform, statistically based standard for all data.<br />

14. Normalization of data. The signals obtained from each array need to be<br />

corrected or normalized for possible changes in the overall signal intensity due<br />

to factors such as scanner settings and dye labeling efficiency. This process<br />

uses signals from antibodies targeting an internal standard of known concentration.<br />

Antibodies against proteins commonly expressed in serum, such as<br />

immunoglobulin isotypes, albumin, or C-reactive protein, can be utilized as<br />

internal controls. A normalization factor is calculated for each array that sets the<br />

data from normalization antibodies to the expected or known values. A highly<br />

specific and quantitatively accurate antibody is required for measurement of the<br />

normalization protein. The protein standards can either be present naturally in<br />

the sample or can be spiked in. Naturally occurring proteins that work well is<br />

flag-labeled BSA. It is a widely used peptide tag for which commercial labeling<br />

kits are available. Other tags such as DNP can work well too.<br />

Normalization is recommended to be based on an intensity-dependent<br />

algorithm as follows (24). In this case, the local background in each color<br />

channel is subtracted from the signal at each antibody spot, and spots having<br />

obvious defects, no detectable signal by GenePix, or a low net fluorescence


282 Sanchez-Carbayo<br />

in either color channel are removed from analysis. The ratio of net signal<br />

from the sample-specific channel to the net signal from the reference-specific<br />

channel is calculated for each antibody spot, and ratios from replicated antibody<br />

measurements in the same array are averaged. It is common to plot a red (Cy5)<br />

versus green (Cy3) channel scatter plot to examine the distribution of intensities;<br />

however, transforming to fold change versus average intensity displays<br />

the data in a more easily readable form. If I red is the background subtracted red<br />

channel intensity, and I green is the background subtracted green intensity, then the<br />

following variables are created: R = I red /I green andA= √ (I red ×I green ), where R is<br />

simply the fold change ratio and A is the average intensity (the geometric mean<br />

that is equivalent to averaging the log intensity). The curvature in the scatter<br />

plot indicated a dependence of the ratio R on the overall intensity. This curve<br />

is then used to normalize the data: log I red /I green →log (I red /I green −c A, where<br />

c(A) is the fit. This is equivalent to multiplying the green channel intensity<br />

(or dividing the red) by an intensity dependent normalization constant k(A)<br />

where log [(k(A)] = c(A). The optimal normalized data should be horizontal and<br />

centered (24).<br />

15. Data analysis. A critical step using quantitative data obtained through antibody<br />

arrays is the establishment of a filtering process to assess the quality of the<br />

data. The conceptual similarity of label-based antibody arrays with two-color<br />

competitive detection genomic arrays has allowed the application of normalization<br />

and data analysis tools classically utilized for cDNA arrays to protein<br />

profiling using antibody arrays (24). In order to obtain efficient measurement<br />

of multiple proteins simultaneously with high sensitivity, specificity, and<br />

quantitative accuracy over large concentration ranges and reproducibility, it is<br />

necessary to consider quality control issues in the design of the arrays (1,4,9).<br />

Optimal assessment of technology through filtering and data analyses procedures<br />

will later address the linearity, calibration, and specificity of the antibodies, as<br />

well as if labeling and/or hybridization protocols are optimized adequately to<br />

ensure high signal-to-noise ratios (3,24). The very first level of quality control<br />

deals with the experimental design of the printing of antibody arrays, which<br />

should include various replicated spots dispersed along the complete surface<br />

of the array as well as the inclusion of controls in every single experiment to<br />

evaluate the intra- and inter-assay reproducibility of the measurements (1,4,9).<br />

The array should also include appropriate means that serve to test the presence of<br />

potential antibody interferences and cross-reactivity. In this regard, the quantity<br />

of antibody spotted can be used to standardize the antigen concentration. It is<br />

possible to use an internally controlled system where one color represents the<br />

amount of antibody spotted, and the other color represents the amount of antigen<br />

that is used to quantify the level of protein expression. This normalization for<br />

antibody spot intensity can decrease variability and lower the limits of detection<br />

of antibody arrays.<br />

The initial control of scanned data is at the spot level using the scanner<br />

software, e.g., GenePix (24). The customized report created can be utilized<br />

to analyze the quality of spots, and it is then possible to flag those spots of


Dissecting Cancer Serum Protein Profiles 283<br />

low quality. The criteria to flag the spots may include the standard deviations<br />

away from background, the R 2 , or the percent saturation (3,24). At<br />

the array level of comparison, the quality control of data includes normalization<br />

of the array, as well as calculation of average and standard deviation<br />

of the intensities of each antibody in its various replicates along the slide<br />

(3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24). Spots with high<br />

standard deviation between replicated spots can be filtered out. Normalization<br />

of the arrays can be performed using the average intensity of each array (24),<br />

protein standards such as Immunoglobulin G (1,21), or internal controls based<br />

on antibody spot intensity (31).<br />

In the next level of data filtering, each experiment set is compared, and the<br />

results are calibrated to a dilution series of antibodies by a best fit line removing<br />

data with high variability. The results can also be correlated to independent<br />

measurements obtained through enzyme-immunoassays (ELISA) available to<br />

quantify targets included in the antibody arrays. At this step, if the series for an<br />

antibody is bad, the antibody can be flagged. It is possible to set thresholds of<br />

expression for an antibody, specifying a maximum and minimum ratio for spots<br />

to be considered in further analyses (24). This is a critical step due to its ability to<br />

filter the input data based on the standard deviation between replicate spots, and<br />

also the output data based on the standard deviation of dilution experiments. The<br />

last level of quality control refers to the comparison of independent experiment<br />

sets based on internal controls that will allow comparison between experiments<br />

performed on different days. The combined use of unsupervised and supervised<br />

methods can identify protein patterns associated with disease progression and<br />

clinical outcome.<br />

16. One should be aware that there are limitations of research procedures working<br />

with antibody arrays, associated with false positive and negative results, which<br />

may be overcome using different strategies. Causes of false negative results<br />

on antibody arrays include: (1) The protein product may have been degraded<br />

by serum proteases during sample handling. (2) Interferences in the antibodyantigen<br />

binding process resulting in low detection of the target protein. The<br />

specificity of the targets for bladder cancer progression is addressed by immunohistochemistry,<br />

and using antibodies targeting different epitopes. The specificity<br />

of antigen-antibody binding is assessed by reverse-protocols, printing<br />

purified proteins and Western blots. Addition of protease inhibitors and serum<br />

preservation at –80°C will avoid protein degradation during sample handling.<br />

Serum aliquots will avoid degradation effects associated with repetitive thawing–<br />

freezing cycles. Modifications in amplification protocols such as rolling-circle<br />

amplification may increase signal detection.<br />

Similarly, the causes of false positive results on antibody arrays include: (A)<br />

The antibody is binding non-specific molecules or degradation products of the<br />

target protein. (B) Gelatin or protein-related additives to antibodies printed onto<br />

arrays. (C) The presence of heterophilic antibodies in serum samples. (D) Nonspecific<br />

binding of antibodies present in patients with any autoimmune or other<br />

diseases. False positive results can be addressed in several ways. Cross-reactivity


284 Sanchez-Carbayo<br />

can be overcome by the selection of alternative antibodies directed to other<br />

epitopes (A), or including different preservatives without gelatin (B). In cases<br />

C and D, the interference and recovery experiments proposed for the analytical<br />

validation of antibodies using dilution and recovery coefficients will estimate<br />

the amount of interference. Clinical records on other coexisting diseases in the<br />

patients analyzed, enzyme-immunoassays, and immunohistochemical analyses<br />

will assist to interpret the unexpected results. The specificity of antigen-antibody<br />

binding can be assessed by reverse-protocols, printing purified proteins and<br />

Western blots.<br />

5. Final Remarks<br />

The methods and applications of antibody arrays are increasing in scope<br />

and effectiveness. The current and new antibody array formats that may be<br />

developed in the near future are likely to markedly accelerate the rate of<br />

biomarker discovery and characterization of cancer-specific pathways that will<br />

eventually lead to the development of individualized therapies that take into<br />

account markers of disease predisposition and therapeutic response. However,<br />

multiple challenges remain in the design and application of antibody arrays (33,<br />

34,35): (1) poor understanding of protein immobilization; (2) limited dynamic<br />

ranges of no more than three orders of magnitude; (3) achieving accuracy<br />

and reproducibility similar to clinical immunoassays; (4) molecular protein<br />

complexity and denaturation affecting immunoreactivity; (5) lack of standards<br />

and calibrators; (6) development of high-affinity and specific antibodies for<br />

target antigens. Such challenges are being addressed by the multi-institutional<br />

effort of the Human Proteome Organization (HUPO) toward the standardization<br />

of critical parameters in serum or plasma proteomic analyses. Initial studies<br />

provide guidance on pre-analytical variables that can alter the analysis of bloodderived<br />

samples, including choice of sample type, stability during storage, use<br />

of protease inhibitors, and clinical standardization [(33); see also Chapter 2).<br />

As part of the HUPO approach, it is also critical to standardize the statistical<br />

strategies for high-confidence protein identification and data analyses. These<br />

efforts and strategies toward integrating proteomic datasets would lead toward<br />

accurate and comprehensive representation of human proteomes (34–35)<br />

References<br />

1. Haab BB, Dunham MJ, Brown PO. (2001). Protein microarrays for highly parallel<br />

detection and quantitation of specific proteins and antibodies in complex solutions.<br />

Genome Biol. 2(2): research 0004.1–0004.13.<br />

2. Chan SM, Ermann J, Su L, Fathman CG, Utz PJ. (2004). Protein microarrays for<br />

multiplex analysis of signal transduction pathways. Nat Med. 10, 1390–6.


Dissecting Cancer Serum Protein Profiles 285<br />

3. Sanchez-Carbayo M. (2006). Antibody arrays: technical considerations and clinical<br />

applications in cancer. Clin Chem. 52, 1651–9.<br />

4. Barry R, Diggle T, Terrett J, Soloviev M. (2003). Competitive assay formats for<br />

high-throughput affinity arrays. J Biomol Screen. 8, 257–63.<br />

5. Pang S, Smith J, Onley D, Reeve J, Walker M, Foy C. (2005). A comparability<br />

study of the emerging protein array platforms with established ELISA procedures.<br />

J Immunol Meth. 302, 1–13.<br />

6. Lash GE, Scaife PJ, Innes BA, Otun HA, Robson SC, Searle RF, Bulmer<br />

JN. (2006). Comparison of three multiplex cytokine analysis systems: Luminex,<br />

SearchLight and FAST Quant. J Immunol Meth. 309, 205–8.<br />

7. de Jager W, Rijkers GT. (2006). Solid-phase and bead-based cytokine immunoassay:<br />

a comparison. Methods 38, 294–303.<br />

8. Waterboer T, Sehr P, Pawlita M. (2006). Suppression of non-specific binding in<br />

serological Luminex assays. J Immunol Methods. 309, 200–4.<br />

9. Kingsmore SF. (2006). Multiplexed protein measurement: technologies and applications<br />

of protein and antibody arrays. Nat Rev Drug Discov. 5, 310–21.<br />

10. Wang X, Yu J, Sreekumar A, Varambally S, Shen R, Giacherio D, Mehra R, Montie<br />

JE, Pienta KJ, Sanda MG, Kantoff PW, Rubin MA, Wei JT, Ghosh D, Chinnaiyan<br />

AM. (2005). Autoantibody signatures in prostate cancer. N Engl J Med. 353, 1224–35.<br />

11. Anderson KS, LaBaer J. (2005). The sentinel within: exploiting the immune system<br />

for cancer biomarkers. J Proteome Res. 4, 1123–33.<br />

12. Petricoin EF III, Bichsel VE, Calvert VS, Espina V, Winters M, Young L, Belluco<br />

C, Trock BJ, Lippman M, Fishman DA, Sgroi DC, Munson PJ, Esserman LJ,<br />

Liotta LA. (2005). Mapping molecular networks using proteomics: a vision for<br />

patient-tailored combination therapy. J Clin Oncol. 23, 3614–21.<br />

13. Angenendt P, Glokler J, Murphy D, Lehrach H, Cahill DJ. (2002). Toward<br />

optimized antibody microarrays: a comparison of current microarray support<br />

materials. Anal Biochem. 309, 253–60.<br />

14. Espina V, Woodhouse EC, Wulfkuhle J, Asmussen HD, Petricoin EF III, Liotta<br />

LA. (2004). Protein microarray detection strategies: focus on direct detection<br />

technologies. J Immunol Methods. 290, 121–33.<br />

15. Levit-Binnun N, Lindner AB, Zik O, Eshhar Z, Moses E. (2003). Quantitative<br />

detection of protein arrays. Anal Chem. 75, 1436–41.<br />

16. Pawlak B, Gordon R. (2005). Density estimation for positron emission tomography.<br />

Technol Cancer Res Treat. 4, 131–42.<br />

17. Schweitzer B, Roberts S, Grimwade B, Shao W, Wang M, Fu Q, Shu Q, Laroche<br />

I, Zhou Z, Tchernev VT, Christiansen J, Velleca M, Kingsmore SF. (2002).<br />

Multiplexed protein profiling on microarrays by rolling-circle amplification. Nat<br />

Biotechnol. 20, 359–65.<br />

18. Pasternack RF, Collings PJ. (1995). Resonance light scattering: a new technique<br />

for studying chromophore aggregation. Science. 269, 935–9.<br />

19. Stich N, Gandhum A, Matyushin V, Raats J, Mayer C, Alguel Y, Schalkhammer T.<br />

(2002). Phage display antibody-based proteomic device using resonance-enhanced<br />

detection. J Nanosci Nanotechnol. 2, 375–81.


286 Sanchez-Carbayo<br />

20. Lindskog M, Rockberg J, Uhlen M, Sterky F. (2005). Selection of protein epitopes<br />

for antibody production. Biotechniques. 38, 723–7.<br />

21. Miller JC, Zhou H, Kwekel J, Cavallo R, Burke J, Butler EB, Teh BS, Haab BB.<br />

(2003). Antibody microarray profiling of human prostate cancer sera: antibody<br />

screening and identification of potential biomarkers. Proteomics. 3, 56–63.<br />

22. Zhou H, Bouwman K, Schotanus M, Verweij C, Marrero JA, Dillon D, Costa J,<br />

Lizardi P, Haab BB. (2004). Two-color, rolling-circle amplification on antibody<br />

microarrays for sensitive, multiplexed serum-protein measurements. Genome Biol.<br />

5, R28.<br />

23. Shao W, Zhou Z, Laroche I, Lu H, Zong Q, Patel DD, Kingsmore S, Piccoli SP.<br />

(2003). Optimization of rolling-circle amplified protein microarrays for multiplexed<br />

protein profiling. J Biomed Biotechnol. 5, 299–307.<br />

24. Sanchez-Carbayo M, Socci ND, Lozano JJ, Haab BB, Cordon-Cardo C. (2006).<br />

Profiling bladder cancer using targeted antibody arrays. Am J Pathol. 168, 93–103.<br />

25. Saviranta P, Okon R, Brinker A, Warashina M, Eppinger J, Geierstanger BH.<br />

(2004). Evaluating sandwich immunoassays in microarray format in terms of the<br />

ambient analyte regime. Clin Chem. 50, 1907–20.<br />

26. Huang R, Lin Y, Shi Q, Flowers L, Ramachandran S, Horowitz IR, Parthasarathy<br />

S, Huang RP. (2004). Enhanced protein profiling arrays with ELISA-based amplification<br />

for high-throughput molecular changes of tumor patients ′ plasma. Clin<br />

Cancer Res. 10, 598–609.<br />

27. Varnum SM, Woodbury RL, Zangar RC. (2004). A protein microarray ELISA for<br />

screening biological fluids. Methods Mol Biol. 264, 161–72.<br />

28. Gembitsky DS, Lawlor K, Jacovina A, Yaneva M, Tempst P. (2004). A prototype<br />

antibody microarray platform to monitor changes in protein tyrosine phosphorylation.<br />

Mol Cell Proteomics. 3, 1102–18.<br />

29. Janzi M, Odling J, Pan-Hammarstrom Q, Sundberg M, Lundeberg J, Uhlen M,<br />

Hammarstrom L, Nilsson P. (2005). Serum microarrays for large scale screening<br />

of protein levels. Mol Cell Proteomics. 4, 1942–7.<br />

30. MacBeath G, Schreiber SL. (2000). Printing proteins as microarrays for highthroughput<br />

function determination. Science. 289, 1760–3.<br />

31. Knezevic V, Leethanakul C, Bichsel VE, Worth JM, Prabhu VV, Gutkind JS,<br />

Liotta LA, Munson PJ, Petricoin EF 3rd, Krizman DB. (2001). Proteomic profiling<br />

of the cancer microenvironment by antibody arrays. Proteomics. 1, 1271–8.<br />

32. Arenkov P, Kukhtin A, Gemmell A, Voloshchuk S, Chupeeva V, Mirzabekov A.<br />

(2000). Protein microchips: use for immunoassay and enzymatic reactions. Anal<br />

Biochem. 278, 123–31<br />

33. Rai AJ, Gelfand CA, Haywood BC, Warunek DJ, Yi J, Schuchard MD, Mehigh<br />

RJ, Cockrill SL, Scott GB, Tammen H, Schulz-Knappe P, Speicher DW, Vitzthum<br />

F, Haab BB, Siest G, Chan DW. (2005). HUPO Plasma Proteome Project specimen<br />

collection and handling: towards the standardization of parameters for plasma<br />

proteome samples. Proteomics. 5, 3262–77.<br />

34. States DJ, Omenn GS, Blackwell TW, Fermin D, Eng J, Speicher DW, Hanash<br />

SM. (2006). Challenges in deriving high-confidence protein identifications from


Dissecting Cancer Serum Protein Profiles 287<br />

data gathered by a HUPO plasma proteome collaborative study. Nat Biotechnol.<br />

24, 333–8.<br />

35. Uhlen M, Bjorling E, Agaton C, Szigyarto CA, Amini B, Andersen E, Andersson<br />

AC, Angelidou P, Asplund A, Asplund C, Berglund L, Bergstrom K, Brumer<br />

H, Cerjan D, Ekstrom M, Elobeid A, Eriksson C, Fagerberg L, Falk R, Fall J,<br />

Forsberg M, Bjorklund MG, Gumbel K, Halimi A, Hallin I, Hamsten C, Hansson<br />

M, Hedhammar M, Hercules G, Kampf C, Larsson K, Lindskog M, Lodewyckx<br />

W, Lund J, Lundeberg J, Magnusson K, Malm E, Nilsson P, Odling J, Oksvold P,<br />

Olsson I, Oster E, Ottosson J, Paavilainen L, Persson A, Rimini R, Rockberg J,<br />

Runeson M, Sivertsson A, Skollermo A, Steen J, Stenvall M, Sterky F, Stromberg<br />

S, Sundberg M, Tegel H, Tourle S, Wahlund E, Walden A, Wan J, Wernerus H,<br />

Westberg J, Wester K, Wrethagen U, Xu LL, Hober S, Ponten F. (2005). A human<br />

protein atlas for normal and cancer tissues based on antibody proteomics. Mol Cell<br />

Proteomics. 4, 1920–32.


V<br />

Statistics and Bioinformatics in Clinical<br />

Proteomics Data Analysis


16<br />

2D-PAGE Maps Analysis<br />

Emilio Marengo, Elisa Robotti, and Marco Bobba<br />

Summary<br />

Due to the low reproducibility affecting 2D gel-electrophoresis and the complex maps<br />

provided by this technique, the use of effective and robust methods for the comparison<br />

and classification of 2D maps is a fundamental tool for the development of automated<br />

diagnostic methods. A review of classical and recently developed methods for the<br />

comparison of 2D maps is presented here. The methods proposed regard both the analysis<br />

of spot volume datasets through multivariate statistical tools (pattern recognition methods,<br />

cluster analysis, and classification methods) and the analysis of 2D map images through<br />

fuzzy logic, three-way PCA, and the use of moment functions.<br />

The theoretical basis of each procedure is briefly introduced, together with a review<br />

of the most interesting applications present in recent literature.<br />

Key Words: principal component analysis; cluster analysis; classification; SIMCA;<br />

image analysis; moment functions; fuzzy logic; three-way PCA; multidimensional scaling;<br />

spot volume data.<br />

1. Introduction<br />

The development of new and effective methods for the identification of<br />

differences between groups of 2D-PAGE maps represents one of the frontiers<br />

in the field of proteomics, for the development of reliable diagnostic/prognostic<br />

tools. The comparison of sets of 2D maps is not in fact a trivial problem<br />

due to some experimental limitations affecting 2D gel-electrophoresis. In<br />

spite of being a very powerful tool for the separation of proteins in cellular<br />

extracts, 2D gel-electrophoresis is characterized by quite low reproducibility:<br />

From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />

Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />

291


292 Marengo et al.<br />

this limit is dictated by both the specificity of the specimen and the instrumental<br />

procedure employed to obtain the final electrophoretic maps. In fact, the<br />

analyzed biological samples often present complex protein mixtures, covering<br />

a wide range of structures, properties, and molecular weights. The complexity<br />

of the sample is reflected in the complexity of the final map that may contain<br />

hundreds or thousands of spots, with the further appearance of spurious spots<br />

due to impurities or side reactions. The second aspect to reducing reproducibility<br />

in 2D gel-electrophoresis is related to the instrumental technique itself, from<br />

sample preparation to the electrophoretic run. Sample pre-treatment, in fact,<br />

follows a multi-step procedure consisting of several purification and extraction<br />

steps, increasing the overall experimental uncertainty. In addition, the final<br />

result is strongly dependent on a great number of instrumental factors that<br />

have to be taken under strict control: polymerization conditions, temperature,<br />

running conditions, time and temperature during staining and de-staining steps.<br />

An unexpected or random variation of one or more of these instrumental parameters<br />

can strongly affect the final result of reproducibility of the position, size,<br />

and intensity of the spots on the final map.<br />

The large number of spots present on each map and the low reproducibility<br />

of 2D gel-electrophoresis worsen the achievement of a clear classification of<br />

samples and make it quite difficult to use 2D-PAGE maps for diagnostic and<br />

prognostic purposes or for drug-design studies. In this perspective, the use<br />

of effective and robust methods for the comparison and classification of 2D<br />

maps is a key point in the development of automated diagnostic tools based on<br />

proteomics. For taking due consideration of the low reproducibility affecting the<br />

experimental protocol, sets of replicate 2D maps are usually run and compared.<br />

The classical analysis of 2D-PAGE maps is usually carried out by dedicated<br />

software packages, which will be briefly described here. The second part of<br />

the chapter will focus on the use of multivariate statistical tools for a more<br />

effective analysis of the so-called “spot volume datasets” produced by software<br />

packages dedicated to 2D-PAGE image analysis.<br />

The final part of the chapter will be devoted to the most advanced applications<br />

of image analysis tools for the study and classification of 2D maps;<br />

these methods will be presented based on fuzzy logic principles coupled with<br />

multivariate statistical tools or on the calculation of mathematical moments of<br />

the images.<br />

2. Gel Analysis Via Dedicated Software Packages<br />

The analysis of sets of 2D maps is usually carried out via dedicated software<br />

packages; among the most popular are PDQuest, Progenesis, Melanie, Z3,<br />

Phoretix, Z4000, but many other solutions are commercially available.


2D-PAGE Maps Analysis 293<br />

Many papers appeared in the last decade about the development of software<br />

packages (1,2,3), the comparison of the performances of different packages<br />

(4,5), or the widening of particular topics like point pattern matching, reproducibility,<br />

matching efficiency and spot overlapping (6,7,8,9,10,11,12,13,14,<br />

15,16).<br />

All software solutions presently available perform the analysis of sets of 2D<br />

maps based on the digitalized images of gels obtained by laser densitometry,<br />

phosphor imagery, or via a CCD camera. The analysis of digitalized images<br />

involves several steps, which are described here in more detail with particular<br />

reference to one of the most used ones, namely the PDQuest system (17,18,19):<br />

1. Scanning. Gel images are turned into pixel data; each pixel is characterized by a<br />

couple of coordinates x–y indicating its position on the 2D image and a Z value<br />

corresponding to the signal intensity of the pixel. Each map is finally turned into<br />

a series of pixels described by their optical density value (OD).<br />

2. Filtering images. This step performs a pre-processing of gel images, allowing the<br />

elimination of noise, background effects, specks, and other imperfections.<br />

3. Automated spot detection. Spot detection involves the identification of spots<br />

present on each gel independently. The operator has to select the faintest spot<br />

(to set the sensitivity and minimum peak value parameters), the smallest spot<br />

(to set the size scale parameter), and the largest spot that one aims to detect.<br />

A final smoothing is applied to remove spots close to the background level.<br />

Spots are then located on the gel image (i.e., each spot is identified by a couple<br />

of x–y coordinates indicating its position on the gel), fitted by ideal Gaussian<br />

distributions and quantified by the sum of the OD values within each Gaussian<br />

distribution.<br />

4. Matching of protein profiles. Sets of 2D gels are then edited and matched to one<br />

another in a “match set.” Each identified spot is matched to the same spot in<br />

all the other gels of the set under investigation. To this purpose, landmarks are<br />

needed, consisting of reference spots used by PDQuest to align and position the<br />

match set members for matching. The identification of the landmarks sets some<br />

parameters accounting for distortions existing among the gels to be compared.<br />

5. Normalization. Normalization is then applied to the maps to compensate gel-togel<br />

variations due to sample preparation and loading, staining and de-staining<br />

procedures, etc.<br />

6. Differential analysis. This step allows the analysis of different sets of 2D maps,<br />

i.e., control and diseased samples. Within each group of different 2D maps, a<br />

“sample group” is created containing the average values of all the spots identified.<br />

Once the sample groups have been created (i.e., control and diseased samples), the<br />

comparison of the groups is carried out to find differentially expressed proteins.<br />

Usually, only spots showing a two-fold variation are accepted as significantly<br />

changed (100% variation).<br />

7. Statistical analysis. Statistical analysis is then applied to the differentially<br />

expressed proteins. It is usually based on Student’s t-test (p


294 Marengo et al.<br />

The final result of the overall procedure, therefore, appears deeply dependent<br />

on the accuracy of the software package adopted, and so the choice of the most<br />

suitable analysis software is critical.<br />

Commercial software packages, in spite of being powerful tools for image<br />

analysis, present two main disadvantages. The first one is related to human<br />

interference, which is introduced mainly in steps 2 and 3. The second disadvantage<br />

is related to the problem of replicas; the comparison of different groups<br />

of 2D maps is performed on the basis of the obtained “sample group” of each<br />

class, i.e., a gel containing the average of the information common to all replicates.<br />

In this way, single replicas are not considered, and the information about<br />

the reproducibility of the maps is not taken into proper consideration.<br />

3. Analysis of Spot Volume Datasets<br />

Spot volume datasets coming from the differential analysis via dedicated<br />

software (step 5 of the procedure described in Section 2) are particularly suitable<br />

for investigation by means of multivariate statistical tools; this is due both to<br />

their large dimensionality (a large number of spots identified on each map) and<br />

to the difficulty in identifying the small differences existing between groups<br />

of maps when hundreds of spots are contemporarily detected on each sample.<br />

From this point of view, multivariate statistical tools represent the best<br />

alternative since they are able to provide a clear representation of the case<br />

under study, considering all the variables contemporarily, and produce robust<br />

results, i.e., eliminating the contribution of experimental uncertainty. Among<br />

the statistical techniques that are and have been recently and successfully<br />

applied to spot volume datasets are pattern recognition methods, e.g., Principal<br />

Component Analysis (PCA) and Cluster Analysis; classification methods, e.g.,<br />

Linear Discriminant Analysis (LDA) and Soft-independent Model of Class<br />

Analogy (SIMCA); and regression methods e.g., discriminant analysis–partial<br />

least squares regression (DA-PLS).<br />

Data from spot volume datasets present a multivariate structure, where<br />

several samples (maps) are described by a large number of variables (spots<br />

identified). Multivariate data are usually arranged in matrices to undergo the<br />

statistical analysis. The datasets taken into account hereafter are arranged in<br />

data matrices of dimensions n × p, where n is the number of samples (one for<br />

each row of the matrix) and p is the number of variables (one for each column<br />

of the matrix).<br />

3.1. Principal Component Analysis<br />

Principal Component Analysis (20,21) is a multivariate pattern recognition<br />

method that represents the objects, described by the original variables, in a


2D-PAGE Maps Analysis 295<br />

new reference system characterized by new variables called principal components<br />

(PCs; see also Chapter 17). Each PC has the property of explaining the<br />

maximum possible amount of residual variance contained in the original dataset:<br />

the first PC explains the maximum amount of variance contained in the overall<br />

dataset, while the second one explains the maximum residual variance. The<br />

PCs are then calculated hierarchically so that experimental noise and random<br />

variations are contained in the last PCs.<br />

The PCs maintain a strict relationship with the original reference system,<br />

since they are calculated as linear combinations of the original variables. They<br />

are also orthogonal to each other, thus containing independent sources of information<br />

(Fig. 1). The hierarchical way in which PCs are calculated makes them<br />

useful for operating a dimensionality reduction of the original dataset: in fact,<br />

a large number of original variables can be substituted by a smaller number of<br />

significant PCs, containing a relevant amount of information when compared to<br />

the overall amount of variance contained in the original dataset, but eliminating<br />

experimental uncertainty (which is accounted for by the last PCs).<br />

Principal Component Analysis provides two main tools for data analysis: the<br />

scores and the loadings. The scores represent the coordinates of the samples<br />

in the new reference system, while the loadings represent the coefficients of<br />

the linear combination describing each PC, i.e., the weights of the original<br />

variables on each PC. The graphical representation of the scores in the space<br />

of the PCs allows the identification of groups of samples showing a similar<br />

behavior (samples close to one another in the graph) or different characteristics<br />

(samples far from each other). By looking at the corresponding loading plot, it<br />

is possible to identify the variables that are responsible for the analogies or the<br />

differences detected for the samples in the score plot.<br />

An example of loading and score plot is represented in Fig. 2. Data belong<br />

to four groups of 2D maps (24 maps described by more than 1000 spots). From<br />

the score plot, it is possible to discriminate the four groups of samples present:<br />

Fig. 1. Construction of the principal components.


296 Marengo et al.<br />

(A)<br />

Loading Plot<br />

PC2<br />

0.08<br />

0.06<br />

0.04<br />

0.02<br />

0.00<br />

– 0.02<br />

– 0.04<br />

– 0.06<br />

– 0.08<br />

V435<br />

V352<br />

V119 V160<br />

V426 V217<br />

V215<br />

V111 V479<br />

V430<br />

V295<br />

V968<br />

V796<br />

V148V317<br />

V60<br />

V451 V84 V150<br />

V423<br />

V729<br />

V208<br />

V363 V303<br />

V428<br />

V381<br />

V269 V405<br />

V475<br />

V509 V759<br />

V1076<br />

V112<br />

V188 V856<br />

V513<br />

V158<br />

V228<br />

V275<br />

V136<br />

V310<br />

V605 V912 V1008<br />

V259<br />

V753 V931<br />

V276<br />

V419 V450<br />

V145<br />

V416<br />

V42<br />

V94 V413 V672<br />

V788<br />

V1006<br />

V1116<br />

V915 V847 V550<br />

V409<br />

V305<br />

V139<br />

V1079 V743<br />

V17<br />

V237<br />

V41 V308 V603 V166<br />

V534<br />

V818 V963<br />

V916<br />

V280 V328<br />

V271<br />

V346<br />

V415<br />

V526<br />

V113<br />

V823<br />

V668<br />

V309<br />

V726 V486 V458<br />

V116 V96 V176 V781<br />

V834<br />

V1064 V888<br />

V708<br />

V204<br />

V279<br />

V474 V877<br />

V130<br />

V138<br />

V86<br />

V50<br />

V361<br />

V388<br />

V429 V403 V476<br />

V359<br />

V452<br />

V522 V709<br />

V932<br />

V902<br />

V973<br />

V949<br />

V982<br />

V478<br />

V512 V725<br />

V379<br />

V489<br />

V465<br />

V266<br />

V365<br />

V31<br />

V296<br />

V128 V367<br />

V436<br />

V555<br />

V890<br />

V1010 V1034<br />

V1167<br />

V987 V939<br />

V741 V653 V1106<br />

V675<br />

V717 V921<br />

V1107<br />

V1127<br />

V477 V990<br />

V214 V311<br />

V493 V250 V70 V55<br />

V65<br />

V99 V122<br />

V167 V245<br />

V283 V288<br />

V397<br />

V674<br />

V768<br />

V524<br />

V531<br />

V881<br />

V860 V889 V906<br />

V828 V632 V542 V919 V652 V946<br />

V950 V967 V1019<br />

V1137<br />

V1001<br />

V972<br />

V1050<br />

V200<br />

V58<br />

V74 V77 V103<br />

V124 V341<br />

V325<br />

V185 V172<br />

V97<br />

V351<br />

V195 V189<br />

V297<br />

V380<br />

V408<br />

V463<br />

V117 V246<br />

V443 V470<br />

V492<br />

V506<br />

V517<br />

V790<br />

V521 V784 V841 V563 V824 V1004 V1023 V754<br />

V591<br />

V872 V901<br />

V937<br />

V871 V883<br />

V616 V1039<br />

V947<br />

V287<br />

V98<br />

V21<br />

V126<br />

V142<br />

V143 V203<br />

V298<br />

V454<br />

V528 V395 V469<br />

V495<br />

V353<br />

V553<br />

V650 V640 V571V613<br />

V649 V582<br />

V597 V730 V899 V1017 V1062<br />

V1154 V1155<br />

V1157<br />

V174<br />

V157<br />

V360<br />

V364<br />

V231<br />

V414<br />

V501<br />

V182<br />

V255<br />

V273 V292 V256<br />

V44 V220 V4<br />

V199 V146 V114 V110<br />

V59<br />

V137<br />

V180<br />

V194<br />

V53<br />

V78<br />

V227<br />

V230<br />

V278 V336<br />

V399<br />

V538<br />

V554<br />

V439 V567 V579<br />

V637<br />

V850<br />

V1133 V985<br />

V1092<br />

V771<br />

V813<br />

V865<br />

V859<br />

V933<br />

V665<br />

V787<br />

V976<br />

V1040<br />

V758<br />

V1091<br />

V900<br />

V898<br />

V922<br />

V453 V576<br />

V669<br />

V274 V347 V689<br />

V760 V772<br />

V808 V798<br />

V863<br />

V984<br />

V938<br />

V1007<br />

V869<br />

V998<br />

V745<br />

V253<br />

V257<br />

V312<br />

V302<br />

V324<br />

V427<br />

V491<br />

V738<br />

V778<br />

V101 V177<br />

V118<br />

V369<br />

V420<br />

V447<br />

V455 V421<br />

V581 V705<br />

V809<br />

V802 V1105 V979<br />

V941<br />

V1014<br />

V1060 V1037<br />

V617<br />

V920 V917<br />

V934<br />

V1020<br />

V643<br />

V519<br />

V592<br />

V536 V737<br />

V1067 V1003 V864 V964 V1109 V1030<br />

V1114<br />

V1084<br />

V1049<br />

V684<br />

V472<br />

V490 V149 V156 V216 V270 V560<br />

V168<br />

V9<br />

V30 V73 V106<br />

V35<br />

V92 V108<br />

V229<br />

V483<br />

V557 V569<br />

V618<br />

V644<br />

V686 V630 V691 V840<br />

V842 V822<br />

V1058 V980<br />

V1087 V1080<br />

V774<br />

V211 V224 V267<br />

V935<br />

V961 V804 V791 V1078<br />

V693 V1126<br />

V996<br />

V1082<br />

V514<br />

V243<br />

V348<br />

V19<br />

V102<br />

V56 V104<br />

V36 V93<br />

V115<br />

V135<br />

V213<br />

V385<br />

V394 V306<br />

V400<br />

V410<br />

V716<br />

V251<br />

V262<br />

V570 V826<br />

V1036 V543<br />

V318 V264 V284<br />

V222 V123 V197<br />

V339 V334<br />

V376<br />

V437<br />

V559 V516 V599<br />

V456<br />

V125 V396<br />

V503<br />

V505 V552 V623<br />

V878<br />

V639<br />

V831 V609 V966 V805<br />

V903 V965 V943 V953 V928<br />

V879 V1013 V1074<br />

V1085<br />

V473<br />

V508 V100<br />

V608<br />

V587<br />

V236<br />

V625<br />

V706 V634<br />

V191 V26<br />

V159<br />

V354 V401<br />

V485<br />

V32<br />

V45<br />

V105<br />

V85<br />

V133<br />

V152<br />

V181<br />

V238 V404<br />

V329<br />

V307<br />

V496<br />

V547<br />

V187<br />

V249<br />

V234 V527 V561<br />

V590<br />

V529<br />

V572<br />

V588<br />

V641<br />

V671<br />

V656<br />

V660<br />

V734 V810 V849<br />

V1063 V1083 V1164<br />

V843<br />

V848<br />

V1129 V1135<br />

V1045 V955<br />

V692 V682<br />

V511<br />

V254<br />

V412 V633 V573 V747 V596<br />

V884<br />

V789 V904<br />

V344<br />

V80<br />

V244 V54<br />

V621<br />

V779 V780<br />

V929 V994<br />

V1066<br />

V1042<br />

V1069<br />

V991<br />

V857<br />

V914 V956<br />

V807<br />

V978 V1011 V1128 V1119<br />

V1149<br />

V1165<br />

V1166<br />

V1075<br />

V1056<br />

V1123<br />

V1124<br />

V43<br />

V258<br />

V285 V291<br />

V417 V386<br />

V390 V461<br />

V504<br />

V109<br />

V332<br />

V433 V418<br />

V241<br />

V127<br />

V29 V38<br />

V179 V171<br />

V232<br />

V375 V484<br />

V502 V549<br />

V510<br />

V498<br />

V626<br />

V638 V777<br />

V1121 V1111 V1093<br />

V793 V940 V930<br />

V1145<br />

V740<br />

V696<br />

V736<br />

V535<br />

V462<br />

V63<br />

V201<br />

V337 V247 V615 V206 V459<br />

V272<br />

V170<br />

V25 V186 V33<br />

V193<br />

V248<br />

V494<br />

V566 V580<br />

V642<br />

V647<br />

V676<br />

V695<br />

V769<br />

V820<br />

V891 V911<br />

V942<br />

V1057<br />

V1138<br />

V1101<br />

V1141<br />

V797<br />

V926<br />

V851<br />

V892<br />

V711 V723 V752<br />

V762<br />

V662<br />

V750 V659<br />

V602<br />

V221 V301<br />

V666 V766 V679 V829<br />

V601 V545 V373<br />

V703 V832<br />

V844<br />

V854 V855 V874<br />

V893 V952<br />

V1025<br />

V1110<br />

V1041<br />

V1026<br />

V1152<br />

V896<br />

V913<br />

V1018<br />

V957<br />

V948 V1077<br />

V299<br />

V79 V169 V8<br />

V16<br />

V140<br />

V320 V192<br />

V293<br />

V164 V165<br />

V2<br />

V207 V184<br />

V212<br />

V338 V537 V482 V595<br />

V546<br />

V678 V681<br />

V358<br />

V362<br />

V20 V87<br />

V219<br />

V392V370<br />

V480<br />

V235<br />

V81 V252 V22<br />

V129 V173<br />

V48<br />

V155<br />

V268<br />

V144 V190 V218<br />

V294<br />

V342<br />

V533 V460<br />

V600<br />

V624 V718<br />

V835 V812<br />

V698<br />

V763<br />

V532 V343<br />

V411<br />

V523<br />

V578 V728 V830 V598 V593<br />

V685 V720 V862<br />

V905 V951 V918<br />

V954<br />

V1029<br />

V1146 V1143 V1096<br />

V1102 V1095<br />

V960 V1086<br />

V875 V1144<br />

V1098 V969 V1094<br />

V1044<br />

V544<br />

V677<br />

V424<br />

V539<br />

V744<br />

V821<br />

V800<br />

V702V321<br />

V627<br />

V225<br />

V398 V746 V497<br />

V556<br />

V95<br />

V71<br />

V202<br />

V434<br />

V551 V612<br />

V651<br />

V64<br />

V226 V68 V260 V240V507<br />

V564 V628<br />

V714<br />

V814<br />

V838<br />

V846 V853<br />

V958<br />

V1160<br />

V1031<br />

V959<br />

V962<br />

V970<br />

V1065<br />

V944 V923V924 V936 V870<br />

V1051<br />

V1108 V1140<br />

V1070<br />

V861 V852<br />

V72 V314<br />

V648<br />

V239<br />

V153 V261<br />

V286 V290<br />

V330<br />

V377 V382<br />

V402<br />

V393<br />

V449<br />

V699 V733<br />

V773 V880 V1021<br />

V909<br />

V1159 V1072 V1000<br />

V815<br />

V1081<br />

V885 V1125<br />

V1068 V999<br />

V1071<br />

V876<br />

V631 V619<br />

V783 V886 V971 V977 V1059<br />

V1161<br />

V801<br />

V316 V389<br />

V432 V457<br />

V565<br />

V442<br />

V739 V732 V664 V575<br />

V724 V742<br />

V175 V407 V383<br />

V196 V422 V132 V488<br />

V444 V319<br />

V562<br />

V663<br />

V765<br />

V670<br />

V907<br />

V894<br />

V908 V1099<br />

V981<br />

V1035<br />

V1054 V1132<br />

V1047<br />

V1142<br />

V1134 V925 V794 V1153<br />

V1158<br />

V1163 V1162<br />

V694<br />

V727<br />

V10 V151 V178 V304 V162 V39 V83<br />

V233<br />

V466<br />

V471 V622 V558<br />

V764 V667 V792<br />

V776<br />

V989<br />

V1012<br />

V997<br />

V986 V1005<br />

V1151<br />

V722<br />

V680 V636 V372<br />

V371<br />

V88<br />

V183<br />

V340 V147 V355 V52<br />

V131<br />

V209 V468 V518<br />

V583<br />

V300<br />

V326 V464<br />

V586<br />

V541<br />

V751 V1028<br />

V1043<br />

V1073 V1100<br />

V1097<br />

V782<br />

V1033 V1027<br />

V735 V384<br />

V755<br />

V839<br />

V1103<br />

V1131<br />

V1139<br />

V910<br />

V313 V837<br />

V47<br />

V629<br />

V499<br />

V594<br />

V391<br />

V687<br />

V749<br />

V481<br />

V635<br />

V767<br />

V1089<br />

V927<br />

V1113<br />

V945 V1104 V1038<br />

V1009 V1120 V1115<br />

V1088<br />

V775 V995 V811<br />

V697<br />

V75<br />

V27<br />

V5<br />

V40<br />

V89 V277<br />

V210<br />

V62<br />

V333 V289<br />

V322 V487 V704 V731 V1022<br />

V1156<br />

V1090<br />

V1061<br />

V1024<br />

V1055 V525<br />

V707<br />

V24 V242 V349 V350 V548<br />

V441 V515<br />

V719 V568<br />

V584<br />

V867 V620<br />

V770 V1136<br />

V690<br />

V1148<br />

V585<br />

V604<br />

V57V141<br />

V378<br />

V263 V1015<br />

V107<br />

V431 V577<br />

V610 V858<br />

V992<br />

V817 V868 V540<br />

V23<br />

V356 V374<br />

V134<br />

V607 V757V345<br />

V661 V673<br />

V683<br />

V51 V611 V614 V265<br />

V715<br />

V387 V530<br />

V748<br />

V873<br />

V895<br />

V988<br />

V1032<br />

V82 V67<br />

V281<br />

V712<br />

V756<br />

V761 V1130 V1052<br />

V90<br />

V13 V76V49<br />

V833<br />

V445<br />

V335<br />

V819 V710 V520<br />

V467<br />

V836<br />

V887 V1048<br />

V1046<br />

V1122<br />

V1118<br />

V866<br />

V440<br />

V974<br />

V897 V331<br />

V368<br />

V37<br />

V28 V12V3<br />

V154<br />

V315 V161<br />

V448<br />

V825<br />

V1016 V1053<br />

V1147<br />

V688<br />

V6<br />

V34<br />

V323 V1117<br />

V198 V18 V366<br />

V15<br />

V655<br />

V1002<br />

V700<br />

V61 V66<br />

V14 V69 V121<br />

V163<br />

V406 V327<br />

V91 V438 V827 V845 V786<br />

V645 V446<br />

V882<br />

V120 V205<br />

V721 V46 V606<br />

V657 V1112 V574 V654 V785 V993 V975<br />

V795<br />

V816<br />

V357<br />

V658 V806<br />

V1150<br />

V701<br />

V282<br />

V425<br />

V803<br />

V500 V589 V983<br />

V223<br />

V646 V713 V799<br />

– 0.06 – 0.04<br />

–0.02 0.00 0.02<br />

0.04<br />

0.06<br />

PC1<br />

(B)<br />

PC2<br />

25<br />

20<br />

15<br />

10<br />

5<br />

0<br />

–5<br />

–10<br />

– 15<br />

C6<br />

C5<br />

C1<br />

C4<br />

C2 C3<br />

A<br />

A5<br />

A6<br />

A1<br />

C<br />

A2<br />

A3<br />

A4<br />

Score Plot<br />

B6<br />

AB6<br />

AB1<br />

B2 B3<br />

B4<br />

B1 B5<br />

AB5AB2AB3<br />

AB4<br />

– 20<br />

– 25 –20 –15 –10 –5 0 5 10 15 20 25<br />

PC1<br />

B<br />

AB<br />

Fig. 2. Example of loadings (A) and scores plots (B).<br />

one group in each quadrant. The first PC is able to discriminate samples C and<br />

A (negative scores on PC 1 ) from samples B and AB (positive scores on PC 1 );<br />

PC 2 separates samples C and B (positive scores on PC 2 ) from samples A and<br />

AB (negative scores on PC 2 ). The analysis of the corresponding loading plot<br />

explains the reasons for the separation of samples in the four groups: sample<br />

C shows large intensities of the spots in the 2 nd quadrant and small intensities<br />

of the spots in the 4 th quadrant, sample A shows large intensities of the spots<br />

in the 3 rd quadrant and very small in the 1 st quadrant ; samples AB present<br />

a behavior opposite to that of sample C, while sample B presents a behavior<br />

opposite to sample A.


2D-PAGE Maps Analysis 297<br />

From the point of view of identification of groups of samples and variables<br />

existing in a dataset, PCA is a very powerful visualization tool, which allows<br />

the representation of multivariate datasets by means of only few PCs identified<br />

as the most relevant.<br />

In proteomics, the representation of loadings appears more effective on a<br />

virtual 2D map. In proteomic datasets, in fact, each variable represents a spot,<br />

characterized by a couple of x–y values defining its position on the 2D maps<br />

used for analysis. The loadings of each PC can then be represented on a “virtual”<br />

2D map, where each spot is represented as a circle centered in the corresponding<br />

x–y position: each spot can be described on a color scale, with the increasing<br />

color tone corresponding to an increasing positive or negative loading. This<br />

representation was proposed for the first time by Marengo et al. (22,23).<br />

An example is represented in Fig. 3, where positive and negative loadings<br />

of the first PC are represented, referring to the example of Fig. 2. The representation<br />

appears clearer with respect to the loading plot of Fig. 2, allowing<br />

the immediate identification of the spots showing the most relevant loadings<br />

(darker grey tones) on the corresponding PC.<br />

3.2. Cluster Analysis<br />

Cluster analysis techniques are pattern recognition methods that help to<br />

identify the existence of groups of samples or of variables in a dataset, through<br />

the investigation of the relationships between the objects or variables. Cluster<br />

analysis tools are unsupervised methods, where the operator does not know the<br />

dataset partition and wants to identify potential groups of objects. From this<br />

point of view, they are different from classification methods, where the operator<br />

does know the separation of objects in classes and wants to obtain the best<br />

classification of objects in the corresponding class. The most used clustering<br />

methods belong to the class of agglomerative hierarchical methods (24), where<br />

the objects are grouped (linked together) on the basis of a measure of their<br />

similarity. The most similar objects or groups of objects are linked first. The<br />

final result is a graph, called dendrogram; the objects are represented on the x<br />

axis and are connected at decreasing levels of similarity along the y axis. An<br />

example is reported in Fig. 4, referring to the dataset already presented in Figs. 2<br />

and 3. The four groups of samples can be identified by applying a horizontal<br />

cut of the dendrogram, i.e., at a dissimilarity level of 25%, and identifying the<br />

number of vertical lines present. The clustering technique applied shows a first<br />

partition of the samples into two main groups that can be further separated<br />

into three groups at a dissimilarity level of 50%. The four groups present can<br />

be identified only by applying a further cut at a dissimilarity level of 25%.<br />

Samples B and AB, thus, appear the most similar groups.


298 Marengo et al.<br />

220<br />

Positive Loadings<br />

200<br />

180<br />

160<br />

140<br />

120<br />

100<br />

80<br />

60<br />

40<br />

20<br />

0<br />

0 20 40 60 80 100 120 140 160 180 200 220<br />

220<br />

Negative Loadings<br />

200<br />

180<br />

160<br />

140<br />

120<br />

100<br />

80<br />

60<br />

40<br />

20<br />

0<br />

0 20 40 60 80 100 120 140 160 180 200 220<br />

Fig. 3. Positive and negative loadings of PC 1 represented on a virtual 2D-map.


2D-PAGE Maps Analysis 299<br />

100<br />

Ward Method<br />

Euclidean Distances<br />

80<br />

(D leg / D max )*100<br />

60<br />

40<br />

20<br />

AB<br />

B<br />

A<br />

C<br />

0<br />

AB3<br />

AB2<br />

AB5<br />

AB4<br />

AB6<br />

AB1<br />

B3<br />

B2<br />

B6<br />

B5<br />

B4<br />

B1<br />

A4<br />

A1<br />

A6<br />

A2<br />

A5<br />

A3<br />

C4<br />

C5<br />

C2<br />

C3<br />

C6<br />

C1<br />

Fig. 4. Dendrogram (Ward method, Euclidean distances).<br />

The results of hierarchical clustering methods depend on the specific measure<br />

of similarity and on the linking method, and so different methods are usually<br />

adopted to have a general idea of the number of groups present. In general,<br />

the linking methods that provide the best results with regard to the clarity of<br />

groups identified are the Ward method and the Complete Linkage method.<br />

With regard to the measure of similarity, the Euclidean distances are usually<br />

adopted.<br />

Clustering techniques can be applied both to the original variables and to<br />

the results of PCA (scores of the significant PCs), thus achieving a cluster of<br />

samples eliminating the contribution of experimental error and exploiting only<br />

useful sources of variation.<br />

3.3. Classification Methods<br />

The classification methods are particularly suitable for the analysis of<br />

proteomic spot volume datasets since the primary necessity in this application<br />

is the classification of samples belonging to different groups, e.g., to both<br />

control and diseased individuals, to their proper class. The final aim is both the<br />

development of diagnostic tools and the identification of differences existing


300 Marengo et al.<br />

between the classes to shed light on the mechanism of action of a disease or<br />

of a new drug.<br />

Here, two of the most exploited classification methods will be briefly<br />

described: LDA and SIMCA.<br />

3.3.1. Linear Discriminant Analysis<br />

Linear Discriminant Analysis (25,26) belongs to the so-called Bayesian<br />

classification methods, since it exploits the Bayes’s rule; it performs the<br />

classification of samples present in a dataset based on its multivariate<br />

structure.<br />

In Bayesian classification methods, an object, x, is assigned to the class, g,<br />

for which the posterior probability P(g/x) is maximum. Posterior probability is<br />

computed according to Bayes’s formula:<br />

where<br />

Pg/x =<br />

P gfg/x<br />

∑<br />

P k fk/x<br />

P g is the prior probability of class g;<br />

P k is the prior probability of class k (k ≠ g);<br />

f(g/x) is the probability density function of class g; and<br />

f(k/x) is the probability density function of class k.<br />

One normal assumption is that each class is described by a Gaussian multivariate<br />

probability distribution:<br />

where:<br />

P g<br />

fgx =<br />

2 p/2 S g 1/2 e−1/2x i−c g T Sg<br />

−1<br />

P g is the prior probability of class g;<br />

S g is the covariance matrix of class g;<br />

c g is the centroid of class g; and<br />

p is the number of descriptors.<br />

The argument of the exponential function:<br />

k<br />

x i − c g T S −1<br />

g<br />

x i − c g <br />

x i−c g <br />

is the Mahalanobis distance between object x and the centroid of class<br />

g, and it takes into consideration the class covariance structure since it


2D-PAGE Maps Analysis 301<br />

contains the covariance matrix. The covariance matrix accounts for the relationships<br />

existing among the variables for each class, i.e., the shape of the<br />

class.<br />

From the logarithm of posterior probability by eliminating the constant terms,<br />

each object is classified in class g if it is minimum, the so-called discriminant<br />

score:<br />

Dgx = x i − c g T S −1<br />

g<br />

x i − c g + ln S g −2lnP g<br />

In LDA, the covariance matrix of each class is approximated with the pooled<br />

(between the classes) covariance matrix, thus considering all the classes having<br />

a common shape, i.e., a weighted average of the shape of each class present in<br />

the dataset.<br />

The variables contained in the LDA model, which discriminate the classes<br />

present in the dataset, can be chosen by a stepwise algorithm, selecting the<br />

most discriminating variables iteratively. LDA can be performed on both the<br />

original variables or on PCs, thus eliminating the contribution to variation given<br />

by experimental uncertainty.<br />

3.3.2. Soft-Independent Model of Class Analogy<br />

The SIMCA method (27) is based on the independent modeling of each<br />

class by means of PCA; in fact, each class is described by its relevant PCs. The<br />

samples of each class are then contained in the so-called SIMCA boxes, defined<br />

by the relevant PCs of each class. This represents one of the most important<br />

advantages of SIMCA; the classification of each sample is not affected by<br />

experimental uncertainty and spurious information, since each class is modeled<br />

only by its relevant PCs. Moreover, this method is also useful when small<br />

datasets are analyzed (more variables than objects), since it performs substantial<br />

dimensionality reduction.<br />

Thus, SIMCA classification starts with PCA calculated previously on each<br />

class independently, with the identification of relevant PCs for each class. They<br />

define the so-called class model. If the data are autoscaled (mean centering<br />

followed by normalization for the standard deviation of each variable), each<br />

object x iv belonging to class g is modeled as:<br />

x ivg = ∑ a<br />

t iag l vag + r ivg g= 1Ga= 1 A g i= 1 n g v = 1 P<br />

(G = number of classes present; A g = number of significant PCs for class g;<br />

n g = number of samples in class g; P = number of original variables)


302 Marengo et al.<br />

where<br />

t iag = score of the i-th object of class g on the a-th PC;<br />

l vag = loading of the v-th variable on the a-th PC of class g; and<br />

r ivg = residual of the i-th object of class g for variable v.<br />

The values estimated by the model are then:<br />

ˆx ivg = ∑ a<br />

t iag l vag<br />

while the residuals are defined as:<br />

r ivg = ˆx ivg − x ivg <br />

The classification rule of object i is based on a Fisher’s F-test so that object<br />

i is classified in class g if:<br />

rsd 2 ig<br />

rsd 2 g<br />


2D-PAGE Maps Analysis 303<br />

where<br />

sd vc = standard deviation of variable v on class c;<br />

rsd vc = residual standard deviation of variable v of the objects of class c<br />

from the model of their own class.<br />

The MP ranges from 0 (variable irrelevant on the definition of the class<br />

model) to 1.<br />

A typical representation of MP is given in Fig. 5, where the variables are<br />

represented on the x axis, and MP is represented as a bar diagram on the y<br />

axis. Figure 5 represents the MPs of class C in the example of Figs. 2–4.<br />

The discrimination power (DP) is a measure of the ability of each variable<br />

to discriminate between two classes (c and g) at a time. The greater the DP,<br />

the more a variable weights on the classification of an object in class c or g. It<br />

is defined as:<br />

√<br />

rsd 2 vcg + rsd 2 vgc<br />

DP vc =<br />

rsd 2 vc + rsd 2 vg<br />

1.0<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0.0<br />

1<br />

95<br />

189<br />

283<br />

377<br />

471<br />

565<br />

659<br />

753<br />

847<br />

941<br />

1035<br />

1129<br />

Fig. 5. Modeling power of a class of six control samples.


304 Marengo et al.<br />

where<br />

rsd 2 vcg = square residual standard deviation of variable v of the objects<br />

of class c from the model of class g;<br />

rsd 2 vgc = square residual standard deviation of variable v of the objects of<br />

class g from the model of class c;<br />

rsd 2 vc = square residual standard deviation of variable v of the objects<br />

of class c from the model of their own class;<br />

rsd 2 vg = square residual standard deviation of variable v of the objects of<br />

class g from the model of their own class.<br />

The DP is positively defined, but it is not limited. A representation of DP<br />

is shown in Fig. 6; the variables are represented on the x axis, and DPs as bar<br />

diagram on the y axis. Figure 6 represents the DPs of classes A and B for<br />

the example of Figs. 2–5. In general, when the dataset is constituted by two<br />

classes, a unique set of DPs is obtained, corresponding to the discrimination<br />

between the two classes present. On the other hand, where more than two<br />

classes are present, it is possible to obtain a set of DPs for each couple of<br />

classes compared.<br />

Modeling powers and DPs can be represented on a color scale on “virtual”<br />

2D maps, as seen for the loadings plots, for clearer representation. An example<br />

is given in Fig. 7, where the MPs and DPs represented as bar diagrams in<br />

Figs. 5 and 6 are represented on virtual 2D maps.<br />

6000<br />

5000<br />

4000<br />

3000<br />

2000<br />

1000<br />

0<br />

1<br />

95<br />

189<br />

283<br />

377<br />

471<br />

565<br />

659<br />

753<br />

847<br />

941<br />

1035<br />

1129<br />

Fig. 6. Discriminating power of two classes: treated with drug A (six samples) and<br />

with drug B (six samples).


2D-PAGE Maps Analysis 305<br />

220<br />

200<br />

180<br />

160<br />

140<br />

120<br />

100<br />

80<br />

60<br />

40<br />

20<br />

0<br />

0 20 40 60 80 100 120 140 160 180 200 220<br />

Modeling Power of class C<br />

220<br />

200<br />

180<br />

160<br />

140<br />

120<br />

100<br />

80<br />

60<br />

40<br />

20<br />

0<br />

0 20 40 60 80 100 120 140 160 180 200 220<br />

Discrimination Power classes A–B<br />

Fig. 7. MPs and DPs of Figs 5 and 6 represented on virtual 2D-maps.


306 Marengo et al.<br />

3.4. Partial Least Squares (PLS) Regression and Discriminant<br />

Analysis–Partial Least Squares (DA-PLS) Regression<br />

Partial least squares is a regression method using the information contained<br />

in X data matrix to predict the behavior of Y data matrix. PLS method models<br />

both X and Y variables simultaneously to find the latent variables in X that<br />

will predict the latent variables in Y. These PLS components (latent variables)<br />

are similar to the PCs. If there are several responses, they are modeled together<br />

in a multivariate way (28,29,30). PLS can be used for discriminant analysis<br />

(DA-PLS) by creating a response variable for each category: in the case of<br />

proteomic data, one response variable for each group of samples. Each response<br />

variable is assigned a 1 value for the samples belonging to the corresponding<br />

class, and a 0 value for the samples belonging to different classes.<br />

3.5. Applications<br />

3.5.1. Pattern Recognition Methods<br />

Many applications are reported in literature for the use of multivariate tools<br />

in the analysis of spot volume datasets. PCA can be considered quite a classical<br />

approach with its first application to spot volume data dating back to the mid-<br />

1980s, as reported by Anderson (31) in USA and Tarroux (32) in France.<br />

Anderson (31) reports an application of PCA coupled to cluster analysis to<br />

identify the differences among a panel of human cell lines; all the groups were<br />

successfully separated considering only the subset of proteins present in all<br />

the cell lines contemporarily. Tarroux et al. (32) applied PCA in the HERMeS<br />

software package, again coupled to cluster analysis.<br />

More recently, both PCA and cluster analysis have been applied to the study<br />

of DNA and RNA fragments of several biological systems by the groups of<br />

Couto (33), Johansson (34), and Boon (35) and to the immunological diagnosis<br />

of hydatidosis (36,37). Other applications are from the group of Kovarova (38,<br />

39) and De Moor et al. (40), who applied multivariate tools to microarray data.<br />

Iwadate et al. (41) applied discriminant analysis to the classification of<br />

human gliomas; the proteomic patterns of 85 tissue samples were compared<br />

(52 glioblastoma multiforme, 13 anaplastic astrocytomas, 10 atrocytomas, 10<br />

normal brain tissues). Normal brain tissues could be correctly distinguished<br />

from glioma tissues by cluster analysis, which proved to be significantly correlated<br />

with the patient survival. Discriminant analysis extracted a set of 37<br />

proteins differentially expressed based on histological grading.<br />

Principal Component Analysis has been also applied to toxicological studies<br />

by the groups of Amin (42), Hejine (43), and Anderson (44). The first paper<br />

(42) reports a study on the effect on expression profile of genes played by three


2D-PAGE Maps Analysis 307<br />

nephrotoxicants (cisplatin, gentamicin, and puromycin) on rats, as a function of<br />

time after initial administration. PCA and gene expression-based clustering of<br />

compound effects confirmed sample separation based on dose, time, and degree<br />

of renal toxicity. Heijne (43) studied the acute hepatotoxicity induced in rats by<br />

bromobenzene administration; the physiological symptoms recorded coincided<br />

with many changes of hepatic mRNA and protein content. PCA proved to be<br />

effective in the discrimination between control and treated samples for both<br />

protein and gene expression profiles; some of the proteins that significantly<br />

changed upon bromobenzene treatment were identified by mass spectrometry.<br />

Anderson (44) investigated the effects of five peroxisome proliferators on the<br />

protein profile in the livers of treated mice at 5- and 35-day time points. Data<br />

for the selected set of 107 liver protein spots, which respond strongly to at<br />

least one of the test compounds, were subjected to PCA to search for global<br />

protein pattern changes. PC 1 was identified as a global measure of peroxisome<br />

proliferation by its correlation with enzymatic peroxisomal -oxidation, while<br />

PC 2 separated the samples on the basis of time exposures.<br />

Perrot et al. (45) applied PCA to the comparison of protein expression of<br />

gel-entrapped Escherichia coli cells submitted to a cold shock at 4 °C with<br />

those of exponential- and stationary-phase free-floating cells. Ten different<br />

incubation conditions were considered; each experiment was replicated three<br />

times and each gel was run in duplicate. PCA was carried out on the 203 spots<br />

identified as significantly reproducible than those corresponding to synthesis<br />

at 37 °C, using the average spot intensities for each experimental condition<br />

adopted. In order to remove the variability of staining conditions among the<br />

gels, each spot volume was normalized by the sum of volumes of all the spots<br />

detected on each map. The data were autoscaled before PCA. From score<br />

analysis, it was possible to point out that the protein response of immobilized<br />

cells after the cold shock was significantly different from those of exponentialand<br />

stationary-phase free-floating organisms. The reasons for these differences<br />

could be searched for in the loadings analysis, from which the identification of<br />

nine families of proteins could also be confirmed.<br />

Principal Component Analysis was applied to identify the differences in<br />

macrophage maturation in the U937 human lymphoma cell line by Verhoeckx<br />

et al. (46). PCA proved to be effective in the identification of variations between<br />

samples belonging to different macrophage maturation times, where standard<br />

t-tests identified a smaller number of biomarkers. Another application (47)<br />

consisted of the characterization of anti-inflammatory compounds.<br />

Other applications from Marengo (22,23,48) exploit PCA coupled to both<br />

cluster analysis and SIMCA classification for the identification of differences<br />

between groups of maps. The first application (48) refers to a spot quantity<br />

dataset comprising 435 spots detected in 18 samples belonging to two different


308 Marengo et al.<br />

cell lines of control (untreated) and drug-treated pancreatic ductal carcinoma<br />

cells. The study was conceived for the identification of the role played by drugs<br />

on different cell lines. PCA allowed clear discrimination of the four groups of<br />

samples with the use of three PCs, and the analysis of the loadings provided<br />

reasons for the differences among groups of samples. The results were further<br />

confirmed by cluster analysis. Identification of some of the most relevant spots<br />

was also performed by mass spectrometry. The other two applications (22,23)<br />

regard the use of PCA and SIMCA to the classification of proteomic maps.<br />

The first paper (22) shows an application to the adrenal glands of healthy and<br />

diseased mice. PCA was able to discriminate the two classes of samples by<br />

means of the first PC, the loadings of which allowed the identification of spots<br />

responsible for the differences. SIMCA was then applied for the classification<br />

of samples in the two classes, and it was able to correctly classify all the<br />

samples present with one PC in the SIMCA model of each class. SIMCA<br />

allowed the identification of the most discriminating spots by the analysis of<br />

DPs. The comparison between the maps showed up- and down-regulation of<br />

84 polypeptide chains out of a total of 700 spots detected.<br />

An analog approach was followed even for the comparison of phenotypic<br />

expression of mantle cell lymphoma GRANTA-519 and MAVER-1 cell<br />

lines (23).<br />

Marengo proposed an alternative method to show loadings from PCA, and<br />

modeling and discriminating powers calculated by SIMCA. In order to obtain<br />

clearer representation of the results, the spots showing relevant discriminating<br />

and/or modeling power (and loadings as well) are represented on a virtual 2D-<br />

PAGE map. Each discriminating spot is represented as a circle on a virtual 2D<br />

map; the position of each spot is determined by its x–y coordinates identified by<br />

standard software packages (PDQuest in this case). The spots are represented on<br />

a color scale: darker red tones identify spots showing a larger discriminating or<br />

modeling power. The use of such representations in common software packages<br />

could represent a valid alternative to the standard visualization of loadings for<br />

each variable in the space given by two PCs at a time.<br />

Fujii et al. (49) studied the histological subtypes of lymphoid neoplasms:<br />

42 cell lines from human lymphoid neoplasms were included. The discriminating<br />

spots were selected by means of different methods used in sequence:<br />

(1) Wilcoxon or Kruskal–Wallis tests to find spots whose intensity was significantly<br />

(p < 0.05) different among the cell line groups, (2) statistical learning<br />

methods to prioritize the spots according to their contribution to the classification,<br />

and (3) unsupervised classification methods to validate classification<br />

robustness by the selected spots. Thirty-one spots resulted to be significant, 24<br />

of which were identified by mass spectrometry.


2D-PAGE Maps Analysis 309<br />

Other applications are in the field of food quality (coupled to cluster analysis<br />

and discriminant analysis): several examples are present in literature about<br />

cheese classification (50) and identification of the protein content in wheat and<br />

bread (51,52).<br />

3.5.2. Discriminant Analysis–Partial Least Squares<br />

With regard to the application of DA-PLS methods, many papers have<br />

appeared in the last few years. Jessen et al. (53) demonstrated with two<br />

examples how information can be extracted from 2DE data by discrimination<br />

PLSR with variable selection. The time course of post mortem proteome<br />

changes in the muscle tissues of pigs was investigated. A first discriminant<br />

PLSR was performed on the spot volume dataset derived from usual analysis<br />

via dedicated software (Bioimage 2D Analyser, Genomic Solutions, USA), the<br />

independent response being a binary indicator of the individual pig considered<br />

or of the sampling time (post mortem increasing time). PLS has been proved<br />

to be successful in the identification of spots characterized by systematic<br />

variation. In order to identify only those spots showing actual relevant variation<br />

among the groups identified, a variable selection procedure was applied, and<br />

no relevant spots were iteratively eliminated from the model: the final model<br />

chosen contained the minimum number of spots giving the best correlation with<br />

the response. For variable selection, a jack-knifing procedure was selected.<br />

Kleno et al. (54) applied PCA and PLS to the identification of the mechanism<br />

of action of hydrazine toxicity in rat liver samples. PCA was carried out on<br />

a data matrix of dimensions 30 × 431 (30 being the 2D maps: 5 animals × 3<br />

doses of hydrazine × 2 times after administration; 431 being the spots revealed<br />

on the maps). PC 1 was able to separate the samples according to three different<br />

dose levels, while PC 4 allowed the separation of the two times after the administration,<br />

but only for the largest dose level. The analysis of the loadings did<br />

not allow a clear identification of the most relevant discriminating spots, and<br />

so a PLSR was applied to model the Y variable (dose level of hydrazine). A<br />

variable selection according to jack-knifing was applied. The PLS regression<br />

allowed to identify spots that play an important role in the differentiation of<br />

samples according to the dose level administered. The results were compared<br />

to standard univariate t-tests, showing that some spots identified by PLS could<br />

not be identified as relevant by standard t-tests; this is due to the fact that PLS<br />

takes into account the correlation structure of the dataset.<br />

Kiaersgard et al. (55) studied the change in the proteomic profile of cod<br />

muscle samples during different storage conditions. Eleven storage conditions<br />

were taken into account, deriving from a large factorial design including storage<br />

temperature (two levels), storage period (4 levels), and chill storage period


310 Marengo et al.<br />

(5 levels). Each sample was replicated twice, and the replicated samples were<br />

run on different batches. PCA provided a grouping of samples on the basis of<br />

frozen storage time, but no information emerged with respect to the differences<br />

between the samples according to the other two parameters. The study was<br />

refined through the application of DA-PLS with variable selection by a jackknife<br />

procedure, and it allowed the identification of relevant spots with respect<br />

to the differentiation of samples according to the storage time. The authors focus<br />

their attention even on the optimal normalization of data before multivariate<br />

analysis. Autoscaling is in fact the most exploited method for data normalization<br />

in proteomics, but it presents the risk of amplifying the noise; this is particularly<br />

true for proteomics where experimental uncertainty is large. To avoid this<br />

problem, mean centering was applied to the data, and normalization was then<br />

applied by dividing each mean centered value by (SD + B) (SD = standard<br />

deviation of each variable, B = constant term to be optimized). The authors<br />

identified the scale range of B value (2500 in their case) by representing in a<br />

scatter diagram the mean volume for each variable (spot) versus its standard<br />

deviation: the best value was then selected by considering several values of<br />

B, as the value giving the best agreement between univariate and multivariate<br />

approaches.<br />

Gottfries et al. (56) applied both PCA and DA-PLS to the study of two<br />

different datasets: the first dataset consists of samples of cerebrospinal fluid<br />

from control individuals and individuals affected by different pathologies (12<br />

control, 15 with Alzheimer’s disease, 15 with Frontotemporal dementia, and 10<br />

with Parkinson’s disease), giving a final dataset of dimension 52 × 96 (96 spots<br />

identified on 52 maps). The second dataset consists of liver samples from normal<br />

and obese mice (samples were grouped into six groups comprising four to eight<br />

animals each); the final dataset has dimension 30 × 603 (30 being the samples,<br />

and 603 the spots identified). In both cases, the groups of samples present in<br />

each dataset could be separated by means of the first three PCs after the application<br />

of PCA. DA-PLS was then applied to each dataset in order to identify<br />

the spots responsible for the differences between each pair of groups: in all the<br />

cases the first latent variable computed was able to correctly classify the samples.<br />

In another application, Karp et al. (57) demonstrated the effectiveness of<br />

PLS-DA in the identification of the differences in three proteomic datasets;<br />

among them, a dataset in which no difference was expected between the two<br />

groups of samples considered was also included: in this case, as expected, PLS-<br />

DA provided no model. Finally, Norden et al. (58) applied PCA and DA-PLS<br />

to the identification of the differences between urine samples of smoking and<br />

non-smoking individuals.<br />

The great number of applications of PCA, PLS, and other multivariate tools<br />

in proteomics (31–59) gives a clear idea of the importance of multivariate


2D-PAGE Maps Analysis 311<br />

methods in this field; such techniques are in fact able to identify a larger number<br />

of variables (spots) relevant for discrimination between the classes of samples<br />

with respect to the classical t-tests usually carried out by standard software<br />

packages.<br />

4. Image Analysis<br />

The second approach to 2D-PAGE analysis is focused on the direct analysis<br />

of 2D maps images. This approach could present a fundamental advantage to<br />

proteomic data analysis: the elimination of contribution given by the operator,<br />

which is usually relevant when dedicated software packages for proteomic<br />

maps analysis are used. Several methods for direct 2D maps image analysis<br />

are reported in literature, but they are not yet much widespread to be included<br />

in common software packages; these methods mainly exploit artificial neural<br />

networks, fuzzy logic principles, and the calculation of mathematical moments.<br />

Such procedures represent the frontier in bioinformatics, and some of them<br />

are yet under development. The main principles related to these methods will<br />

be presented here, together with a review of the most interesting applications<br />

present in literature.<br />

4.1. Fuzzy Logic<br />

The low reproducibility of 2D gel-electrophoresis, pointed out earlier in this<br />

chapter, produces significant differences even among maps corresponding to<br />

replicates of the same electrophoretic run; these differences consist of changes<br />

in spot position, size, and shape. The precise description of the position of each<br />

spot in terms of x–y coordinates thus appears very difficult to accomplish. The<br />

uncertainty on the position and shape of each spot can be effectively treated<br />

by fuzzy logic principles. Marengo et al. (60,61,62,63,64) successfully applied<br />

fuzzy logic principles coupled to multivariate statistical tools to the analysis<br />

of sets of 2D maps.<br />

Their four-step procedure consists of:<br />

1. image digitalization;<br />

2. image defuzzyfication;<br />

3. image refuzzyfication;<br />

4. application of multivariate tools to fuzzy maps.<br />

4.1.1. Image Digitalization<br />

The first step consists of scanning each map by a densitometer to provide a<br />

description of the map as a grid of a given step containing in each cell the OD


312 Marengo et al.<br />

ranging from 0 to 1. The contribution to the signal of each map given by the<br />

background is eliminated by applying a cut-off value to each map (generally<br />

0.3/0.4): the values below the cut-off value are transformed into null values.<br />

The cut-off value applied has to be optimized independently for each case<br />

study.<br />

4.1.2. Image Defuzzyfication<br />

The second step mainly performs defuzzyfication of each map, consisting<br />

of the elimination of sensitivity due to the destaining protocol. The digitalized<br />

image is, in fact, turned into a grid of binary values: 0 is assigned to the cell<br />

where no signal is detected, 1 to the cell where a value above the cut-off<br />

threshold is present.<br />

4.1.3. Refuzzyfication<br />

The previous step eliminates the information about spatial uncertainty as<br />

well, since each spot is no more described by grey-scale values but only<br />

by binary values (presence/absence). This step is then focused on the reintroduction<br />

of information about spatial uncertainty. Each cell containing a 1<br />

value in step 2 is substituted by a 2D probability function. The most suitable<br />

distribution is a 2D Gaussian function. The probability of finding a signal in<br />

cell x i , y j when a signal is already present in cell x k , y l is given by:<br />

where<br />

1<br />

fx i y j x k y l = e<br />

2 x y<br />

[<br />

1<br />

21− 2 <br />

x i −x k 2<br />

2 x<br />

]<br />

+ y j −y l 2<br />

y<br />

2<br />

is correlation between 1 st and 2 nd dimension;<br />

(x i , y j ) is the position of the spot influencing the spot in position (x k , y l );<br />

y is the standard deviation along 1 st dimension; and<br />

x : is the standard deviation along 2 nd dimension.<br />

The correlation between the two dimensions () is usually fixed at 0, corresponding<br />

to the complete independence of two electrophoretic runs; the two<br />

standard deviations, x and y , correspond to the standard deviations of the<br />

2D Gaussian function along the x and y axes. Maintaining them identical<br />

corresponds to an identical repeatability of the result with respect to the two<br />

electrophoretic runs (according to the pH gradient and molecular mass): in<br />

this case, the parameter that is analyzed for its effect on the final result is<br />

= x = y . Alternatively, the two parameters can be fixed at different values,


2D-PAGE Maps Analysis 313<br />

usually x = 1.5 y , corresponding to an uncertainty along the second dimension<br />

that is about 50% larger than that along the first dimension. The separation<br />

according to the molecular mass is in fact expected to show a larger uncertainty<br />

(self-made polymerization of the gel for the second run versus a first dimension<br />

run on commercial strips).<br />

A change in parameter (or of parameters x and y ) corresponds to the<br />

modification of distance at which an occupied cell exerts its effect: large <br />

values reflect in a perturbation operating at larger distances. Smaller values<br />

correspond to a perturbation operating at a smaller distance, with spots acting<br />

a lesser effect on their neighbourhood and a crisper final image. Therefore, the<br />

larger the parameter, the larger the fuzzyfication level applied to the maps.<br />

In general, best results are expected for intermediate levels of parameters,<br />

corresponding to not too fuzzyfied maps (nor too blurred final images).<br />

With respect to the choice of probability function, the Gaussian distribution<br />

appeared to be the best alternative, since spots can be described as<br />

intensity/probability distributions with the highest intensity/probability value at<br />

the center of the spot and decreasing intensities/probabilities as the distance<br />

from the center increases. In addition, the integral of the Gaussian function on<br />

the whole domain of the 2D-PAGE is 1, corresponding to a total signal that is<br />

blurred but, in the meantime, maintained quantitatively coherent.<br />

The value of the signal S k in each cell x i , y j of the fuzzy map is calculated<br />

by the sum of the effect of all neighbor cells x<br />

j ′ , y′ j<br />

containing spots:<br />

S k =<br />

∑<br />

f ( )<br />

x i y j x i' y j'<br />

i'j='1n<br />

Even if the sum runs on all the cells in the grid, only the neighbor cells are<br />

influenced by the presence of a signal, depending on the parameter.<br />

The procedure consists of turning each digitalized image into a virtual map<br />

containing, in each cell, the sum of the influence of all the spots of the original<br />

2D-PAGE; these virtual maps can be called fuzzy matrices or fuzzy maps. Due<br />

to the existence of complex spots of irregular shape in real maps, the Gaussian<br />

function is associated to each cell instead of to each spot.<br />

Figure 8 represents an example of fuzzyfication of a map at different <br />

values; the example shows the digitalized and defuzzyfied maps and the fuzzyfication<br />

of the map for five increasing values.<br />

4.1.4. Application of Multivariate Tools to Fuzzy Maps<br />

The final fuzzy maps can then be analyzed by several multivariate tools<br />

for diagnostic/prognostic purposes. Two approaches will be presented here: (1)<br />

the coupling of PCA and classification tools; (2) the use of multi-dimensional<br />

scaling (MDS) techniques.


314 Marengo et al.<br />

(A)<br />

Digitalised image<br />

(B)<br />

De-fuzzyfied image<br />

20<br />

20<br />

40<br />

40<br />

60<br />

60<br />

80<br />

80<br />

100<br />

100<br />

120<br />

120<br />

140<br />

140<br />

160<br />

160<br />

180<br />

180<br />

200<br />

20 40 60 80 100 120 140 160 180 200<br />

200<br />

20 40 60 80 100 120 140 160 180 200<br />

(C)<br />

σ = 0.50 (D)<br />

σ = 1.00<br />

20<br />

20<br />

40<br />

40<br />

60<br />

60<br />

80<br />

80<br />

100<br />

100<br />

120<br />

120<br />

140<br />

140<br />

160<br />

160<br />

180<br />

180<br />

200<br />

20 40 60 80 100 120 140 160 180 200<br />

200<br />

20 40 60 80 100 120 140 160 180 200<br />

(E)<br />

σ = 1.50 (F)<br />

σ = 2.00<br />

20<br />

20<br />

40<br />

40<br />

60<br />

60<br />

80<br />

80<br />

100<br />

100<br />

120<br />

120<br />

140<br />

140<br />

160<br />

160<br />

180<br />

180<br />

200<br />

20 40 60 80 100 120 140 160 180 200<br />

200<br />

20 40 60 80 100 120 140 160 180 200<br />

(G)<br />

σ = 2.50<br />

20<br />

40<br />

60<br />

80<br />

100<br />

120<br />

140<br />

160<br />

180<br />

200<br />

20 40 60 80 100 120 140 160 180 200<br />

Fig. 8. Sample ILL1 from (61): digitalized image (A); defuzzyfied image (B);<br />

fuzzyfication at five values (C–G).


2D-PAGE Maps Analysis 315<br />

4.1.4.1. PCA and Classification Methods (61)<br />

Marengo et al. (61) have reported an application of PCA and LDA to fuzzy<br />

maps to a set of eight 2D maps belonging to control and mantle cell lymphoma<br />

samples.<br />

Principal Component Analysis can be applied to images by the previous<br />

unwrapping of each image; each sample (map) is turned into a series of variables<br />

describing the signal in each position of the map. In this case, 200 × 200<br />

pixel images were taken into consideration, providing a final set of 40,000<br />

variables for each map. PCA is particularly useful here to detect a small number<br />

of components accounting for the differences existing between the groups of<br />

samples and operating, in the meantime, a dimensionality reduction. The significant<br />

PCs calculated were used to build a LDA model to classify the samples;<br />

the selection of the variables for LDA model, which discriminates between<br />

the classes present in the dataset, was performed by a stepwise algorithm in<br />

forward search (F to−enter = 4.0).<br />

The procedure was repeated for different values of the parameter in order<br />

to detect the best value providing correct classification of the samples with<br />

the smallest number of components in the final LDA model. The best results<br />

(100% of correct assignments) were obtained for values ranging from 1.75<br />

to 2.25, with PC 1 and PC 4 in the final LDA model. The differences existing<br />

between the two groups of samples could then be investigated by the analysis<br />

of loadings on the first and the fourth PCs.<br />

Figure 9 shows the score plot and the loading plot of PC 1 and PC 4 for<br />

= 2.00. The loadings are represented again on a virtual map on a color scale:<br />

white tones correspond to the zones in the map characterized by large positive<br />

loadings and the black tones to the zones characterized by large negative<br />

loadings on the corresponding PC.<br />

4.1.4.2. Multi-Dimensional Scaling<br />

In other applications of multivariate tools to fuzzy maps, Marengo et al.<br />

(62,63) describe the use of MDS procedures. MDS performs a substantial<br />

dimensionality reduction and an effective graphical representation of the data<br />

on the basis of similarity calculated between couples of objects. MDS searches<br />

for the smallest number of dimensions in which the objects can be represented<br />

as points, matching, as much as possible, the distances between the objects<br />

in the new reference system with those calculated in the original reference<br />

system. In these applications, the calculations were performed by the Kruskal<br />

iterative method; the search for the coordinates was based on the steepest<br />

descent minimization algorithm, where the target function is the so-called stress<br />

(S), which is a measure of the ability of the configuration of points to simulate<br />

the original distance matrix.


316 Marengo et al.<br />

10<br />

σ = 2.00<br />

8<br />

HEA2<br />

6<br />

HEA4<br />

4<br />

PC4<br />

2<br />

ILL2<br />

0<br />

–2<br />

ILL3<br />

ILL4<br />

HEA3<br />

–4<br />

ILL1<br />

HEA1<br />

–6<br />

–12 –10 –8 –6 –4 –2 0 2 4 6 8 10 12 14 16 18<br />

PC1<br />

Loadings PC1<br />

Loadings PC4<br />

20<br />

0.04<br />

20<br />

0.03<br />

40<br />

60<br />

80<br />

0.03<br />

0.02<br />

40<br />

60<br />

80<br />

0.02<br />

0.01<br />

100<br />

0.01<br />

100<br />

0<br />

120<br />

140<br />

160<br />

180<br />

200<br />

120<br />

0<br />

140<br />

–0.01 160<br />

180<br />

–0.02<br />

200<br />

20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200<br />

–0.01<br />

–0.02<br />

–0.03<br />

–0.04<br />

Fig. 9. Score plot (A) and loading plots (B) of PC1 and PC4 with = 2.00.<br />

As for the previous applications based on PCA and LDA, several values of <br />

parameter have been investigated, and the one providing the best classification<br />

was selected. In this case, for each value of the parameter, a similarity matrix<br />

has to be built.<br />

From the match between the two fuzzy maps k and l, the common signal<br />

SC kl (the sum of all signals present in both maps) and the total signal ST kl can<br />

be computed:<br />

SC kl = ∑<br />

min ( )<br />

S k i Sl i<br />

i=1n<br />

ST kl = ∑<br />

max ( )<br />

S k i Sl i<br />

i=1n


2D-PAGE Maps Analysis 317<br />

where n is the number of cells in the grid. The similarity index is then<br />

computed by:<br />

S kl = SC kl<br />

ST kl<br />

S kl ranges from 0 (two maps showing no common structure) to 1 (two identical<br />

maps). In both the applications, the optimal values that provide the best classification<br />

of the samples with only one or two dimensions could be identified.<br />

4.2. Moment Functions<br />

Moment functions have been widely used in image analysis, in applications<br />

related to invariant pattern recognition, object classification, pose estimation,<br />

image coding, and reconstruction (65,66,67,68,69). A set of moments computed<br />

from a digital image generally represents global characteristics of the image<br />

shape, and provides a lot of information about different types of geometrical<br />

features of the image. Geometric moments were the first ones to be applied to<br />

images, as they are computationally very simple. With the progress of research<br />

in image processing, many new types of moment functions have been introduced<br />

recently, such as orthogonal moments, rotational moments, and complex<br />

moments, which are useful tools in the field of pattern recognition, and can be<br />

used to describe the features of objects such as shape, area, border, location,<br />

and orientation; naturally each moment function has its own advantages in<br />

specific applications.<br />

The most important and most used moments are orthogonal moments (e.g.,<br />

Legendre (70,71,72) and Zernike moments (73,74,75)), which can attain a<br />

zero value of redundancy measure in a set of moment functions, so that<br />

these orthogonal moments correspond to the independent characteristics of the<br />

image. In other words, moments with orthogonal basis functions can be used<br />

to represent the image by a set of mutually independent descriptors, with a<br />

minimum amount of information redundancy. So far, orthogonal moments have<br />

additional properties of being more robust, with respect to the non-orthogonal<br />

ones, in the presence of image noise. Orthogonal moments also permit analytical<br />

reconstruction of an image intensity function from a finite set of moments,<br />

using the inverse moment transform.<br />

Legendre moments are the most used orthogonal moments and can be implemented<br />

as feature descriptors for 2D-PAGE maps classification.<br />

The main advantages in the use of Legendre moments to clustering the<br />

maps derive from the possibility to obtain invariance to translation, scale, and<br />

rotation; in other words, the original maps, without any pre-treatment, can be<br />

used for classification, and the use of complex commercial software can be<br />

totally avoided.


318 Marengo et al.<br />

The number of calculated moments is very large, and many of them do not<br />

contain information related to the specific target of correctly classifying the<br />

2D-PAGE maps; for this reason a method for selecting the moments having<br />

highest DP must be applied (e.g., LDA).<br />

4.2.1. Legendre Moments<br />

The Legendre polynomials form a complete orthogonal set inside the unit<br />

circle. Moments with Legendre polynomials as kernel functions were first<br />

introduced by Teague (68).<br />

The kernel of Legendre moments are products of Legendre polynomials<br />

defined along the rectangular image coordinate axes inside a unit circle.<br />

The two-dimensional Legendre moments of orderp + qof an image<br />

intensity mapf x y are defined as:<br />

L pq =<br />

2p + 12q + 1<br />

4<br />

∫ 1 ∫ 1<br />

−1<br />

−1<br />

P p x × P q yfx ydxdy<br />

xy∈−11<br />

where Legendre polynomial, P p x, of order p is given by:<br />

{<br />

}<br />

p∑<br />

P p x = −1 p−k 1 p − k!x k<br />

2<br />

(<br />

2 p p−k<br />

) (<br />

!<br />

p+k<br />

)<br />

!k!<br />

k=0<br />

2<br />

2<br />

p−k=even<br />

The recurrence relation of Legendre polynomials, P p x, is:<br />

P p x 2p − 1 xP p−1 x − p − 1 P p−2 x<br />

<br />

p<br />

where P 0 x-1, P 1 x = x, and p>1. Since the region of definition of Legendre<br />

polynomials is the interior of [–1,1], a square image of N × N pixels with<br />

intensity function fi j, 0≤i, j≤( N–1 ) is scaled in the region –1< x,y


2D-PAGE Maps Analysis 319<br />

The reconstruction of image function from calculated moments can be<br />

performed by the following inverse transformation:<br />

p max q<br />

∑ ∑ max<br />

( )<br />

f i j = pq P p x i P q xj<br />

p=0 q=0<br />

Marengo et al. (76) report an interesting application of Legendre moments<br />

to a set of 2D-PAGE maps belonging to two different cell lines of control<br />

(untreated) and drug-treated pancreatic ductal carcinoma cells.<br />

The aim of the work was to obtain the correct classification of the 18 samples<br />

using the Legendre moments as discriminant variables.<br />

Each 2D-PAGE, which was automatically digitalized, was described by a<br />

200×200 matrix of pixels; the value of each pixel varies from 0 to 1 to indicate<br />

the staining intensity in the given position.<br />

The Legendre moments of the 18 digitalized images were calculated.<br />

Moments up to a maximum order of 100 were computed from the images. Each<br />

matrix held the global information of the corresponding 2D-PAGE map.<br />

The final dataset contained 18 samples and 10,201 variables. The number<br />

of variables was very large, and many of them were either redundant or did<br />

not contain information related to the specific target of correctly classifying<br />

the samples; for this reason a method for selecting the variables having the<br />

highest power of discrimination was applied (forward stepwise LDA with<br />

F to−enter = 4.0). The results of stepwise LDA procedure showed that only six<br />

different Legendre moments were necessary in order to correctly classify the<br />

18 samples.<br />

The results demonstrate that the Legendre moments can be successfully<br />

applied for fast classification and similarity analysis of 2D-PAGE maps.<br />

4.3. Other Methods<br />

Schultz et al. (77), together with the application of PCA and PLS to spot<br />

volume data, applied PCA to the analysis of gel images after digitalization and<br />

unwrapping. The choice of the alignment procedure for the sets of gels proved<br />

to be the determinant of the final result. PCA proved to be effective in the<br />

identification of the groups of maps present.<br />

Marengo et al. (78) also applied three-way PCA to the identification of<br />

the differences among groups of 2D maps. Proteomic datasets are suitable<br />

to be treated by three-way method due to their three-way structure: the first<br />

dimension being the pH gradient, the second the molecular mass, and the third<br />

the samples. In three-way PCA, the observed modes (conventionally called I,<br />

J, and K) can be synthesized in more fundamental modes, each element of a<br />

reduced mode expressing a particular structure existing between all or a part


320 Marengo et al.<br />

of the elements of the associated observation mode. The final result is given<br />

by three sets of loadings together with a core array describing the relationship<br />

among them. Each of the three sets of loadings can be displayed and interpreted<br />

in the same way as a score plot of standard PCA. Three-way PCA was preceded<br />

by data transformation to scale all the samples and make them comparable;<br />

to this purpose, maximum scaling was selected and the digitalized 2D PAGE<br />

maps were scaled one at a time to the maximum value for each map. This<br />

method was successfully applied to datasets of human lymph-nodes and rat<br />

sera allowing the identification of the main differences existing among the sets<br />

of 2D maps.<br />

References<br />

1. Mahon, P., Dupree, P., (2001) Quantitative and reproducible two-dimensional gel<br />

analysis using Phoretix 2D Full, Electrophoresis 22, 2075–2085<br />

2. Rubinfeld, A., Keren-Lehrer, T., Hadas, G., Smilansky, Z., (2003) Hierarchical<br />

analysis of large-scale two-dimensional gel-electrophoresis experiments,<br />

Proteomics 3, 1930–1935<br />

3. Anderson, N.L., Taylor, J., Scandora, A.E., Coulter, B.P., Anderson, N.G., (1981)<br />

The TYCHO system for computer analysis of two-dimensional gel electrophoresis<br />

patterns, Clinical Chemistry 27 (11), 1807–1820<br />

4. Rosengren, A.T., Salmi, J.M., Aittokallio, T., Westerholm, J., Lahesmaa, R.,<br />

Nyman, T.A., Nevalainen, O.S., (2003) Comparison of PDQuest and Progenesis<br />

software packages in the analysis of two dimensional electrophoresis gels,<br />

Proteomics 3, 1936–1946<br />

5. Raman, B., Cheung, A., Marten, M.R., (2002) Quantitative comparison and<br />

evaluation of two commercially available, two-dimensional electrophoresis image<br />

analysis software packages, Z3 and Melanie, Electrophoresis 23, 2194–2202<br />

6. Panek, J., Vohradsky, J., (1999) Point pattern matching in the analysis of twodimensional<br />

gel electropherograms, Electrophoresis 20, 3483–3491<br />

7. Pleissner, K.P., Hoffman, F., Kriegel, K., Wenk, C., Wegner, S., Sahistrom, A.,<br />

Oswald, H., Alt, H., Fleck, E., (1999) New algorithmic approaches to protein spot<br />

detection and pattern matching in two-dimensional electrophoresis gel databases,<br />

Electrophoresis 20, 755–765<br />

8. Voss, T., Haberl, P., (2000) Observations on the reproducibility and matching<br />

efficiency of two-dimensional electrophoresis gels: consequences for comprehensive<br />

data analysis, Electrophoresis 21, 3345–3350<br />

9. Cutler, P., Heald, G., White, I.R., Ruan, J., (2003) A novel approach to<br />

spot detection for two-dimensional gel electrophoresis images using pixel value<br />

collection, Proteomics 3, 392–401<br />

10. Molloy, M.P., Brzezinski, E.E., Hang, J., McDowell, M.T., VanBogelen, R.A.,<br />

(2003) Overcoming technical variation and biological variation in quantitative<br />

proteomics, Proteomics 3, 1912–1919


2D-PAGE Maps Analysis 321<br />

11. Moritz, B., Meyer, H.E., (2003) Approaches for the quantification of protein<br />

concentration ratios, Proteomics 3, 2208–2220<br />

12. Wheelock, A.M., Buckpitt, A.R., (2005) Software-induced variance in twodimensional<br />

gel electrophoresis image analysis, Electrophoresis 26, 4508–4520<br />

13. Almeida, J.S., Stanislaus, R., Krug, E., Arthur, J.M., (2005) Normalisation and<br />

analysis of residual variation in two-dimensional gel electrophoresis for quantitative<br />

differential proteomics, Proteomics 5, 1242–1249<br />

14. Pietrogrande, M.C., Marchetti, N., Dondi, F., Righetti, P.G., (2003) Spot<br />

overlapping in two-dimensional polyacrylamide gel electrophoresis maps:<br />

relevance to proteomics, Electrophoresis 24, 217–224<br />

15. Pietrogrande, M.C., Marchetti, N., Dondi, F., Righetti, P.G., (2002) Spot<br />

overlapping in two-dimensional polyacrylamide gel electrophoresis separations: a<br />

statistical study of complex protein maps, Electrophoresis 23, 283–291<br />

16. Campostrini, N., Areces, L.B., Rappsilber, J., Pietrogrande M.C., Dondi, F.,<br />

Pastorino, F., Ponzoni, M., Righetti, P.G., (2005) Spot overlapping in twodimensional<br />

maps: a serious problem ignored for much too long, Proteomics 2005<br />

(5), 2385–2395<br />

17. Garrels, J.I., (1979) Two dimensional gel electrophoresis and computer analysis of<br />

proteins synthesized by clonal cell lines, J. Biol. Chem. 254, 7961–7977<br />

18. Garrels, J.I., Farrar, J.T., Burwell IV, C.B., (1984) In: Celis, J.E., Bravo, R. (Eds.),<br />

Two-dimensional Gel Electrophoresis of Proteins, Academic Press, Orlando, FA,<br />

USA, pp. 38–91<br />

19. Garrels, J.I., (1989) The QUEST system for quantitative analysis of twodimensional<br />

gels, J. Biol. Chem. 264, 5269–5282<br />

20. Massart, D.L., Vandeginste, B.G.M., Deming, S.M., Michotte, Y., Kaufman, L.,<br />

(1988) Chemometrics: A Textbook. Amsterdam, Elsevier<br />

21. Vandeginste, B.G.M., Massart, D.L., Buydens, L.M.C., De Jong, S., Lewi, P.J.,<br />

Smeyers-Verbeke, J., (1998) Handbook of Chemometrics and Qualimetrics: Part B.<br />

Amsterdam, Elsevier<br />

22. Marengo, E., Robotti, E., Righetti, P.G., Campostrini, N., Pascali, J., Ponzoni, M.,<br />

(2004) Study of Proteomic changes associated with healthy and tumoral murine<br />

samples in Neuroblastoma by Principal Component Analysis and classification<br />

methods, Clinica Chimica Acta 345, 55–67<br />

23. Marengo, E., Robotti, E., Bobba, M., Liparota, M.C., Antonucci, F., Rustichelli, C.,<br />

Zamò, A., Chilosi, M., Hamdan, M., Righetti, P.G., (2006) Characterisation of<br />

the proteomic profiles of two human lymphoma cell lines by two-dimensional<br />

gel-electrophoresis and multivariate statistical tools, Electrophoresis 27,<br />

484–494<br />

24. Massart, D.L., Kaufman, L., (1983) In: Elving, P.J., Winefordner, J.D. (Eds.), The<br />

Interpretation of Analytical Chemical Data by the Use of Cluster Analysis. Wiley,<br />

New York, USA<br />

25. Eisenbeis, R.A. (Ed.), (1972) Discriminant Analysis and Classification Procedures:<br />

Theory and Applications. Lexington, USA


322 Marengo et al.<br />

26. Klecka, W.R. (Ed.), (1980) Discriminant Analysis. Sage Publications, Beverly<br />

Hills, USA<br />

27. Wold, S., (1976) Pattern recognition by means of disjoint principal components<br />

models, Pattern Recognition 8, 127–139<br />

28. Martens, H., Naes, T., (1989) Multivariate Calibration, Wiley, London<br />

29. Kleinbaum, D., Kupper, L., Muller, K., (1988) Applied Regression Analysis and<br />

Other Multivariate Methods, 2nd ed.. Pws-Kent, Boston<br />

30. De Noord, O.E., (1994) Multivariate calibration standardization, Chemometr. Intell.<br />

Lab. Syst. 25, 85–97<br />

31. Anderson, N.L., Hofmann, J.P., Gemmell, A., Taylor, J., (1984) Global approaches<br />

to quantitative analysis of gene-expression patterns observed by use of twodimensional<br />

gel electrophoresis, Clin Chem. 30, 2031–2036<br />

32. Tarroux, P., Vincens, P., Rabilloud, T., (1987) HERMeS: a second generation<br />

approach to the automatic analysis of two-dimensional electrophoresis gels. Part<br />

V: Data analysis, Electrophoresis 8, 187–199<br />

33. Couto, M.M.B., Vogels, J.T.W.E., Hofstra, H., Husiintveld, J.H.J., Vandervossen,<br />

J.M.B.M., (1995) Random amplified polymorphic DNA and restriction<br />

enzyme analysis of PCR amplified RDNA in taxonomy, 2 Identification techniques<br />

for food-borne yeasts, J. Applied Bacteriology 79 (5), 525–535<br />

34. Johansson, M.L., Quednau, M., Ahrne, S., Molin, G., (1995) Classification of<br />

lactobacillus-plantarum by restriction-endonuclease analysis of total chromosomal<br />

DNA using conventional agarose-gel electrophoresis, International J. of Systematic<br />

Bacteriology 45 (4), 670–675<br />

35. Boon, N., De Windt, W., Verstraete, W., Top, E.M., (2002) Evaluation of nested<br />

PCR-DGGE (denaturing gradient gel electrophoresis) with group-specific 16S<br />

rRNA primers for the analysis of bacterial communities from different wastewater<br />

treatment plants, FEMS Microbiology Ecology 39 (2), 101–112<br />

36. Gadea, I., Ayala, G., Diago, M.T., Cunat, A., Garcia de Lomas J., (2000) Immunological<br />

diagnosis of human hydatid cyst relapse: utility of the enzyme-linked<br />

immunoelectrotransfer blot and discriminant analysis, Clinical and Diagnostic<br />

Laboratory Immunology 7 (4), 549–552<br />

37. Gadea, I., Ayala, G., Diago, M.T., Cunat, A., Garcia de Lomas, J., (1999) Immunological<br />

diagnosis of human cystic echinococcosis: utility of discriminant analysis<br />

applied to the enzyme-linked mmunoelectrotransfer blot, Clinical and Diagnostic<br />

Laboratory Immunology 6 (4), 504–508<br />

38. Kovarova, H., Hajduch, M., Korinkova, G., Halada, P., Krupickova, S.,<br />

Gouldsworthy, A., Zhelev, N., Strnad, M., (2000) Proteomics approach in classifying<br />

the biochemical basis of the anticancer activity of the new olomoucinederived<br />

synthetic cyclin-dependent kinase inhibitor, bohemine, Electrophoresis 21,<br />

3757–3764<br />

39. Kovarova, H., Radzioch, D., Hajduch, M., Sirova, M., Blaha, V., Macela, A.,<br />

Stulik, J., Hernychova, L., (1998) Natural resistance to intracellular parasites: a<br />

study by two-dimensional gel electrophoresis coupled with multivariate analysis,<br />

Electrophoresis 19 (8–9), 1325–1331


2D-PAGE Maps Analysis 323<br />

40. De Moor, B., Marchal, K., Mathys, J., Moreau, Y., (2003) Bioinformatics:<br />

organisms from Venus, technology from Jupiter, algorithms from Mars, European<br />

Journal of Control 9 (2–3), 237–278<br />

41. Iwadate, Y., Sakaida, T., Hiwasa, T., Nagai, Y., Ishikura, H., Takiguchi, M.,<br />

Yamaura, A., (2004) Molecular classification and survival prediction in human<br />

gliomas based on proteome analysis, Cancer Research 64 (7), 2496–2501<br />

42. Amin, R.A., Vickers, A.E., Sistare, F., Thompson, K.L., Roman, R.J.,<br />

Lawton, M., Kramer, J., Hamadeh, H.K., Collins, J., Grissom, S., Bennett, L.,<br />

Tucker, C.J., Wild, S., Kind, C., Oreffo, V., Davis, J.W., Curtiss, S., Naciff, J.M.,<br />

Cunningham, M., Tennant, R., Stevens, J., Car, B., Bertram, T.A., Afsharil, C.A.,<br />

(2004) Identification of putative gene-based markers of renal toxicity, Environmental<br />

Health Perspectives 112 (4), 465–479<br />

43. Heijne, W.H.M., Stierum, R.H., Slijper, M., van Bladeren, P.J., van Ommen, B.,<br />

(2003) Toxicogenomics of bromobenzene hepatotoxicity: a combined transcriptomics<br />

and proteomics approach, Biochemical Pharmacology 65 (5), 857–875<br />

44. Anderson, N.L., EsquerBlasco, R., Richardson, F., Foxworthy, P., Eacho, P., (1996)<br />

The effects of peroxisome proliferators on protein abundances in mouse liver,<br />

Toxicology and Applied Pharmacology 137 (1), 75–89<br />

45. Perrot, F., Hebraud, M., Charlionet, R., Junter, G.A., Jouenne, T., (2001) Cell<br />

immobilisation induces changes in the protein response of Escherichia coli K-12<br />

to a cold shock, Electrophoresis 22, 2110–2119<br />

46. Verhoeckx, K.C.M., Bijlsma, S., de Groene, E.M., Witkamp, R.F., van der Greef, J.,<br />

Rodenburg, R.J.T., (2004) A combination of proteomics, principal component<br />

analysis and transcriptomics is a powerful tool for the identification of biomarkers<br />

for macrophage maturation in the U937 cell line, Proteomics 4 (4), 1014–1028<br />

47. Verhoeckx, K.C.M., Bijlsma, S., Jespersen, S., Ramaker, R., Verheij, E.R.,<br />

Witkamp, R.F., van der Greef, J., Rodenburg, R.J.T., (2004) Characterization<br />

of anti-inflammatory compounds using transcriptomics, proteomics, and<br />

metabolomics in combination with multivariate data analysis, International<br />

Immunopharmacology 4 (12), 1499–1514<br />

48. Marengo, E., Robotti, E., Cecconi, D., Scarpa, A., Righetti, P.G., (2004) Identification<br />

of the regulatory proteins in human pancreatic cancers treated with<br />

Trichostatin-A by 2D-PAGE maps and Multivariate Statistical Analysis, Analytical<br />

and Bioanalytical Chemistry 379 (7–8), 992–1003<br />

49. Fujii, K., Kondo, T., Yokoo, H., Yamada, T., Matsuno, Y., Iwatsuki, K.,<br />

Hirohashi, S., (2005) Protein expression pattern distinguishes different lymphoid<br />

neoplasms, Proteomics 5, 4274–4286<br />

50. Dewettinck, K., Dierckx, S., Eichwalder, P., Huyghebaert, A., (1997) Comparison<br />

of SDS-PAGE profiles of four Belgian cheeses by multivariate statistics, Lait 77<br />

(1), 77–89<br />

51. Alika, J.E., AkenOva, M.E., Fatokun, C.A., (1995) Variation among maize (Zea<br />

mays L) accessions of Bendel State, Nigeria – numerical analysis of zein protein<br />

band patterns, Genetic Resources and Crop Evolution 42 (4), 393–399


324 Marengo et al.<br />

52. Magdic, D., Horvat, D., Jurkovic, Z., Sudar, R., Kurtanjek, K., (2002) Chemometric<br />

analysis of high molecular mass glutenin subunits and image data of bread crumb<br />

structure from Croatian wheat cultivars, Food Technology and Biotechnology 40<br />

(4), 331–341<br />

53. Jessen, F., Lametsch, R., Bendixen, E., Kjaersgard, I.V.H., Jorgensen, B.M., (2002)<br />

Extracting information from two-dimensional electrophoresis gels by partial least<br />

squares regression, Proteomics 2, 32–35<br />

54. Kleno, T.G., Leonardsen, L.R., Kjeldal, H.O., Laursen, S.M., Jensen, O.N.,<br />

Baunsgaard, D., (2004) Mechanisms of hydrazine toxicity in rat liver investigated<br />

by proteomics and multivariate data analysis, Proteomics 4, 868–880<br />

55. Kjaersgard, I.V.H., Norrelykke, M.R., Jessen, F., (2006) Changes in cod muscle<br />

proteins during frozen storage revealed by proteome analysis and multivariate data<br />

analysis, Proteomics 6, 1606–1618<br />

56. Gottfries, J., Sjogren, M., Holmberg, B., Rosengren, L., Davidsson, P.,<br />

Blennow, K., (2004) Proteomics for drug target discovery, Chemometrics and<br />

Intelligent Laboratory Systems 73, 47–53<br />

57. Karp, N.A., Griffin, J.L., Lilley, K.S., (2005) Application of partial least squares<br />

discriminant analysis to two-dimensional difference gel studies in expression<br />

proteomics, Proteomics 5, 81–90<br />

58. Norden, B., Broberg, P., Lindberg, C., Plymoth A., (2005) Analysis and understanding<br />

of high-dimensionality data by means of multivariate data analysis,<br />

Chemistry and Biodiversity 2 (11), 1487–1494<br />

59. Malone, J., McGarry, K., Bowermann, C., (2006) Automated trend analysis of<br />

proteomics data using an intelligent data mining architecture, Expert Systems with<br />

Applications 30, 24–33<br />

60. Marengo, E., Robotti, E., Gianotti, V., Righetti P.G., (2003) A new approach to<br />

the statistical treatment of 2D-Pages in proteomics using fuzzy logic, Annali di<br />

Chimica 93 (1–2), 105–116<br />

61. Marengo, E., Robotti, E., Righetti, P.G., Antonucci, F., (2003) A new approach<br />

based on fuzzy logic and principal component analysis for the classification of 2Dmaps<br />

in health and disease: application to lymphomas, Journal of Chromatography<br />

A 1004, 13–28<br />

62. Marengo, E., Robotti, E., Gianotti, V., Righetti, P.G., Domenici, E., Cecconi, D.,<br />

(2003) A new integrated statistical approach to the diagnostic use of proteomic<br />

two-dimensional maps, Electrophoresis 24, 225–236<br />

63. Marengo, E., Robotti, E., Cecconi, D., Scarpa, A., Righetti, P.G., (2004) Application<br />

of fuzzy logic principles to the classification of 2D-PAGE maps belonging to<br />

human pancreatic cancers treated with Trichostatin-A, Proceedings of 2004 IEEE<br />

International Conference on Fuzzy Systems, Budapest, Hungary, 25–29 July 2004,<br />

1, 359–364<br />

64. Marengo, E., Robotti, E., Antonucci, F., Cecconi, D., Campostrini, N.,<br />

Righetti, P.G., (2005) Spot matching in two-dimensional gels: a review of<br />

commercial software and of “home-made” approaches, Proteomics 5, 654–666


2D-PAGE Maps Analysis 325<br />

65. Zenkouar, H., Nachit, A., (1997) Images compression using moments method of<br />

orthogonal polynomials, Materials Science and Engineering B 49, 211–215<br />

66. Yin, J., Rodolfo De Pierro, A., Wei, M., (2002) Analysis for the reconstruction of a<br />

noisy signal based on orthogonal moments, Applied Mathematics and Computation<br />

132, 249–263<br />

67. Hu, M.K., (1962) Visual pattern recognition by moment invariants, IRE Transaction<br />

on Information Theory 8, 179–187<br />

68. Teague, M.R., (1980) Image analysis via the general theory of moments, Journal<br />

of the Optical Society of America 70, 920–930<br />

69. Li, B.C., Shen, J., (1991) Fast computation of moment invariants, Pattern Recognition<br />

24, 807–813<br />

70. Chong, C., Raveebdram, P., Mukundan, R., (2004) Translation and scale invariants<br />

of Legendre moments, Pattern Recognition 37, 119–129<br />

71. Mukundan, R., Ramakrishnan, K.R., (1995) Fast computation of Legendre and<br />

Zernike moments, Pattern Recognition 28, 1433–1442<br />

72. Zhou, J.D., Shu, H.Z., Luo, L.M., Yu, W.X., (2002) Two new algorithms for<br />

efficient computation of Legendre moments, Pattern Recognition 35, 1143–1152<br />

73. Wee, C., Paramesran, R., Takeda, F., (2004) New computational methods for full<br />

and subset Zernike moments, Information Sciences 159, 203–220<br />

74. Kan, C., Srinath, M.D., (2002) Invariant character recognition with Zernike and<br />

orthogonal Fourier-Mellin moments, Pattern Recognition 35, 143–154<br />

75. Khotanzad, A., Hong, Y.H., (1990) Invariant image recognition by Zernike<br />

moments, IEEE Transactions on Pattern Analysis and Machine Intelligence 12,<br />

489–497<br />

76. Marengo, E., Bobba, M., Robotti, E., Liparota, M.C., (2005) Use of Legendre<br />

moments for the fast comparison of 2D-PAGE maps images, Journal of<br />

Chromatography A 1096 (1–2), 86–91<br />

77. Marengo, E., Leardi, R., Robotti, E., Righetti, P.G., Antonucci, F., Cecconi, D.,<br />

(2003) Application of three-way principal component analysis to the evaluation<br />

of two-dimensional maps in proteomics, Journal of Proteome Research 2 (4),<br />

351–360<br />

78. Schultz, J., Gottlieb, D.M., Petersen, M., Nesic, L., Jacobsen, S., Sondergaard, I.,<br />

(2004) Explorative data analysis of two-dimensional electrophoresis gels,<br />

Electrophoresis 25 (3), 502–511


17<br />

Finding the Significant Markers<br />

Statistical Analysis of Proteomic Data<br />

Sebastien Christian Carpentier, Bart Panis, Rony Swennen,<br />

and Jeroen Lammertyn<br />

Summary<br />

After separation through two-dimensional gel electrophoresis (2DE), several hundreds<br />

of individual protein abundances can be quantified in a cell population or sample tissue.<br />

Both a good experimental setup and a valid statistical approach are essential to get insight<br />

into the data and to draw correct conclusions. High-throughput 2DE proteomics yield<br />

complex and large datasets with a huge disproportion between the hundreds of variables<br />

and the restricted number of replicates. However, the most commonly used statistical tests<br />

have been designed to cope with a high number of replicates and a restricted number<br />

of variables. There is some inconsistency in the proteomics community related to the<br />

use of statistics. Two approaches of data analysis can be distinguished: exploratory data<br />

analysis and confirmatory data analysis. Currently, most proteomic data are analyzed<br />

with the emphasis on confirmatory analysis and do not take into account the exploratory<br />

data analysis. This chapter gives an overview of the typical statistical exploratory and<br />

confirmatory tools available and suggests case-specific guidelines for a reliable statistical<br />

approach that can be used for 2DE analysis. Examples are given for an experimental<br />

setup based on classical staining methods as well as for the more advanced difference gel<br />

electrophoresis.<br />

Key Words: assumptions; confirmatory data analysis; experimental set-up;<br />

exploratory data analysis; missing values; multivariate statistics; non-parametric test;<br />

parametric test; principal component analysis; univariate statistics.<br />

From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />

Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />

327


328 Carpentier et al.<br />

1. Introduction<br />

The conventional approach to analyze a biological problem is to collect data<br />

in order to test a particular hypothesis. Starting from this hypothesis, the data<br />

are collected, which should lead to an objective and reliable decision. As such,<br />

the hypothesis can be accepted, revised, or rejected. This confirmatory way of<br />

data analysis is accompanied by a number of steps that define the experimental<br />

setup. However, our understanding of a biological system is usually rather<br />

limited, and data may be very heterogeneous and complex. Exploratory data<br />

analysis approaches a biological problem from a different angle and tries to<br />

describe patterns, relationships, trends, outlying data, etc. Two-dimensional<br />

gel electrophoresis (2DE) simultaneously quantifies hundreds of individual<br />

protein abundances in a cell population or sample tissue. High-throughput<br />

2DE proteomics yield complex and large datasets with a huge disproportion<br />

between the hundreds of variables and the restricted number of replicates. Most<br />

commonly used statistical tests are for confirmatory data analysis and have<br />

been designed to cope with a high number of replicates and a restricted number<br />

of variables.<br />

Both a good experimental setup and a valid statistical approach are extremely<br />

important. There is some inconsistency in the proteomics community. Proteomic<br />

data are currently analyzed by a variety of approaches. The objective of this<br />

chapter is to give a concise overview of statistical methods used in functional<br />

genomics and to find a good compromise between statistics and proteome<br />

analysis in practice. This chapter deals with the experimental design and data<br />

analysis and, at the end, provides two practical examples (classical staining<br />

approach and DIGE approach). Section 2 discusses the issues of replicates and<br />

the pooling of samples, and briefly discusses the calibration, normalization,<br />

and quantification of data. Section 3 discusses confirmatory univariate and<br />

exploratory multivariate analysis and the related assumptions and associated<br />

problems.<br />

2. Experimental Design<br />

The design of an experiment is crucial for the robustness of the results<br />

obtained. Careful planning is essential to maximize the information output of<br />

an experiment. The experimental conditions must be well designed in order<br />

to keep variation within an experimental group as small as possible, and the<br />

experimental setup should be kept as simple as possible in order to keep the data<br />

manageable. When the impact of a particular treatment is to be examined, proper<br />

controls should be included (positive and negative control), and irrelevant<br />

external influences should be eliminated or anticipated (e.g., by randomized<br />

design).


Statistical Analysis of Proteomic Data 329<br />

The conventional approach of analyzing a biological problem is to collect<br />

data in order to test a particular hypothesis. The collected data should enable the<br />

researcher to make an objective and reliable decision concerning the hypothesis.<br />

The experimental setup usually includes a procedure that involves several<br />

steps: (1) state a null hypothesis (H 0 ) (e.g., there is no difference in protein<br />

abundance(s) between the treatments) and its alternative (H 1 ) (e.g., there is<br />

a difference between the treatments), (2) to choose the most appropriate test<br />

statistic to check the hypothesis, (3) specify a significance level (i.e., the<br />

accepted level of having false positive results and to reject unjustly the null<br />

hypothesis), (4) specify the sample size (number of replicates) to have sufficient<br />

power, and (5) collect the data. The power of a statistical test is the ability to<br />

detect possible differences between the experimental groups. The power of a<br />

statistical test or the reduction of false negative results depends on the variance,<br />

the change in abundance, the number of replicates, the statistical test chosen,<br />

and the predetermined significance level. Lilley and Karp have illustrated the<br />

relationship between power, replicate number, and relative expression change<br />

in a proteomics experiment (1). Urfer et al. consider the effect of testing all the<br />

proteins simultaneously by means of family-wise error rate and false discovery<br />

rate (2). The number of replicates is the best way to control the power of<br />

a statistical test. Given the labor and cost involved in the 2DE analysis, the<br />

number of replicates is often restricted, and thus the variance (technical and<br />

biological) should be kept in control.<br />

2.1. Replicates<br />

A well-discussed subject is the nature of replicates. Two types of replicates<br />

are reported in 2DE studies: (1) technical replicates (repeated measurements<br />

of the same sample (e.g., the same protein extract) and (2) biological replicates<br />

(different measurements within the same experimental group). Ideally,<br />

only biological replicate samples should be used, and one should try to limit<br />

the technical variability to the strict minimum so that a repeated measurement<br />

of the same sample is not necessary (Fig. 1A). Therefore, both a reliable<br />

sample preparation method (3) and an extended experience in electrophoresis<br />

and proteomic techniques are indispensable (4,5,6,7). Technical variability can<br />

be introduced at the level of (1) sample collection, (2) sample preparation and<br />

protein extraction, (3) sample loading and electrophoresis, and (4) staining and<br />

image analysis. Some staining methods, like silver staining, implicate a lot of<br />

steps, and each sample is run in an individual gel, which makes the approach<br />

susceptible to technical variation. Technical replicates might be considered in<br />

experiments with a low sample yield, with cost restrictions, or when all the<br />

technical variability is still too high (high inter-gel variability) (Fig. 1B).


330 Carpentier et al.<br />

In any case, one should take care to analyze technical replicates next to<br />

biological replicates. Statistically speaking, we are dealing with mixed models<br />

and nested designs (8,9). Karp et al. discuss the impact of mixing biological and<br />

technical replicates in a proteomics experiment (10). Treating technical replicates<br />

as biological replicates can increase the rate of false positives. Analyzing<br />

biological and technical replicates in one test would seem reasonable only<br />

in a nested ANOVA test. If another statistical test should be used, only the<br />

biological replicates are used (Fig. 1A), and the technical repetition of the same<br />

biological samples (proteins extracts) should be considered as a distinct and<br />

confirmatory analysis. With low technical variance observed with the difference<br />

gel electrophoresis (DIGE) approach (see below), the value of the analyzing<br />

technical replicates can be questioned and hence skipped (Fig. 1A).<br />

2.2. Pooling<br />

Another well-debated subject is the pooling of biological samples. Pooling<br />

of individual biological tissues or cells averages the sample. On one hand,<br />

pooling reduces the variability increasing the power, but on the other hand,<br />

there is incontestable loss of relevant information of individuals. The pooling of<br />

samples reduces biological variance in detecting changes in protein abundance<br />

between the averages of the experimental groups. Pooling of samples is usually<br />

done when the biological variation with in an experimental group is too big<br />

(Fig. 1C and 1D), or when an individual starting material is not sufficient<br />

to extract proteins from. Pooling of samples might be useful, but must be<br />

evaluated for each individual experimental setup.<br />

2.3. Data Processing<br />

Common strategies for quantitative determination of gel-separated proteins<br />

include organic dyes (e.g., colloidal coomassie blue), silver staining, radio<br />

labeling, and fluorescent stains (e.g., Deep Purple, Flamingo, SYPRO<br />

Orange/Red/Ruby, and other ruthenium complexes and succinimidyl ester<br />

derivatives of cyanine dyes). The use of a particular staining method should<br />

carefully be considered taking into account the lab equipment available, budget,<br />

and power of a particular method. The dynamic range of staining methods and<br />

the technical variability both have a great impact on the power of a statistical<br />

test and are decisive for the experimental setup (the number of replicates) and<br />

the choice of the statistical test.<br />

Data from 2DE analysis are generated through image analysis software<br />

that detects and quantifies protein abundances and matches the same proteins<br />

across different gels. An important challenge in 2DE is to estimate the protein<br />

concentration in order to ensure that all gels are loaded with an equal amount of


Statistical Analysis of Proteomic Data 331<br />

Fig. 1. Experimental set-up. Theoretical examples of experimental setup control vs.<br />

treatment. (A) Small intra-group variation and small technical variation: four biological<br />

replicates for control and four biological replicates for treatment. (B) Small intra-group<br />

variation and big technical variation—mixed model: four biological and three technical<br />

replicates for control and the same for treatment. (C) Big intra-group variation and<br />

small technical: four replicates of biological pool for control and the same for treatment.<br />

(D) Big intra-group variation and big technical variation—mixed model: four replicates<br />

of biological pool and three technical replicates for control and the same for treatment.


332 Carpentier et al.<br />

proteins, and hence to minimize the technical variation. Most current software<br />

packages take this into account and introduce a calibration or normalization in<br />

order to compensate for image differences caused by protein loading, staining,<br />

and scanning.<br />

2.3.1. Classical Approach<br />

Calibration in a classical approach (like silver or coomassie staining) is<br />

developed to take into account the differences in scanning properties (such as<br />

image depth). Scanner grey values are converted to optical densities so that<br />

intensities are no longer dependent on the original pixel depth. The most logical<br />

normalization procedure to anticipate possible loading differences for a classical<br />

staining is % volume, where the individual spot volumes are normalized by the<br />

total volume of all spots. Normalized data, whether or not transformed, can be<br />

subsequently analyzed statistically by a relevant statistical test (see below).<br />

The most commonly used organic staining is coomassie brilliant blue (CBB)<br />

staining. CBB staining has a relative good dynamic range (approximately 10 3 )<br />

and is perfectly compatible with MS. However, its sensitivity is relatively<br />

low. The limit of protein detection for colloidal CBB stain is approximately<br />

8–10 ng (11). Therefore, several modifications have been proposed to improve<br />

its sensitivity. For an overview, see (12).<br />

The introduction of the first sensitive silver-staining (13) method was a major<br />

breakthrough in the field of protein detection, which led to extensive research<br />

and various alternative silver-staining protocols (14). Silver-staining is still one<br />

of the most sensitive non-radioactive detection techniques with a detection limit<br />

in the lower nanogram range. However, the linearity and dynamic range are<br />

relatively poor (approximately 10 2 or less), the staining is protein-dependent,<br />

and gel-to-gel variation is not negligible due to numerous solution changes and<br />

other carefully timed steps.<br />

2.3.2. Difference Gel Electrophoresis Approach<br />

Fluorescent-based methods are surpassing the conventional technologies in<br />

use. A standard UV-transilluminator can be used for visualization of most<br />

fluorescent stains, but more sophisticated and expensive CCD cameras or laser<br />

scanners are appropriate for quantitative determination. The development of<br />

succinimidyl ester derivatives of different cyanine fluorescent dyes that modify<br />

free amino groups of proteins prior to separation (15) was a major achievement in<br />

terms of reproducibility and throughput. The DIGE approach uses fluorophores<br />

that have different absorption optimum, making it possible to run multiple samples<br />

simultaneously in the same gel. Several dyes were designed to ensure that a<br />

protein acquires the same relative mobility irrespective of the dye used to tag it.


Statistical Analysis of Proteomic Data 333<br />

The difference in MW introduced by different length linkers is compensated by<br />

different alkyl moieties opposite the linker moiety. Originally, only two different<br />

cyanine dyes were included (Cy3 and Cy5), but the concept was extended with<br />

a third dye (Cy2) that opened the way for a total new experimental design that<br />

further exploits the sample multiplexing capabilities of the dyes, by including an<br />

internal standard (16,17). The internal standard is a mixture of equal amounts of<br />

each sample and guarantees a powerful normalization procedure for high accuracy<br />

of protein quantification. This normalization reduces the variability considerably<br />

and brings on reasonable arguments to justify the use of powerful parametric<br />

statistics after transformation of the standardized volume. If multiple conditions<br />

have to be tested spread over different electrophoresis runs, one common internal<br />

standard should be created and included in all the gels of each run. However,<br />

if an experimental setup is too complex, the internal standard will contain too<br />

many samples possibly resulting in an overlap of spots of different samples. The<br />

minimal labeling approach has a dynamic range of four to five orders, and its sensitivity<br />

is currently marginally less sensitive than silver-staining (18). Although<br />

the dyes have been carefully designed, care should be taken in the experimental<br />

design to take into account possible dye-specific effects. Therefore, a supervised<br />

randomization of the Cy3/Cy5 labeling is highly recommended. Not only the<br />

labeling should be randomized, but also the samples representing an experimental<br />

group should be mixed across gels in order to avoid systematic gel artefacts.<br />

3. Data Analysis<br />

3.1. Confirmatory Univariate Data Analysis<br />

Univariate statistical methods examine the individual protein spots one by<br />

one, considering the different proteins as independent measurements. Table 1<br />

gives an overview of some commonly used parametric and non-parametric<br />

univariate tests. Univariate methods start from the null hypothesis that there<br />

is no difference between the two experimental populations. Parametric models<br />

Table 1<br />

Overview of Some Commonly Used Univariate Tests<br />

Classes of data<br />

Univariate statistics<br />

Parametric Non-parametric<br />

Comparing 2 treatments T-test Mann–Whitney/Wilcoxon<br />

Kolmogorov–Smirnov test<br />

Comparing k treatments ANOVA Kruskal–Wallis test


334 Carpentier et al.<br />

like the Student’s T-test start from the observed sampling and assume that the<br />

observed sample mean and variance approximate the real population mean and<br />

variance, and that the variances of the two experimental populations are equal.<br />

Based on the observed mean and variance, the two populations are considered<br />

normally distributed and a model is made (Fig. 2). If the test statistic (or T-<br />

value) is large enough, the null hypothesis is rejected (Eq. 1). The numerator<br />

measures the distance between the experimental means and is thus an estimation<br />

of the inter-group variability; the denominator approximates the real variability<br />

and estimates the intra-group variability.<br />

T 2 = y 2 − y 1 2 /S 2 P 1/n 1 + 1/n 2 (1)<br />

where y i : experimental mean (estimate of the population mean, μ i ); S P : pooled<br />

sample variance (estimate of the variance; it is a weighted average of the<br />

group variances accounting for the number of replicates or samples in each<br />

group); n i : number of replicates per experimental group.<br />

Parametric univariate statistical tests are very powerful, but the data must<br />

respect the restrictive assumptions (continuous and normally distributed data,<br />

homogeneity of variance, and independent samples) and the assumptions must<br />

be tested. A commonly used test for the estimation of homogeneity of variances<br />

is the Levene’s test, and for the estimation of normality, it is the Shapiro-Wilk<br />

test (19). If one assumption is not met, the significance levels and the power<br />

of the test might be invalidated. Transformation of data (e.g., log function,<br />

arcsine, square root) is frequently used to improve the distribution characteristics<br />

(normality and homogeneity of variance) (20). The problem of proteomic<br />

data is the low number of replicates. It is impossible to test these assumptions<br />

starting from the low sample sizes commonly used in 2DE experiments.<br />

Tests like the Levene’s test and the Shapiro-Wilk test are designed for higher<br />

sample sizes and have very limited power at the commonly used sample size in<br />

proteomics experiments. Given the labor and cost involved in the 2DE analysis,<br />

the number of replicates is often restricted and ranges usually between 3 and 6.<br />

Fig. 2. Distribution of two normal populations with a homogeneous variance. μ i :<br />

real population average estimated by the sample average.


Statistical Analysis of Proteomic Data 335<br />

Although some empirical evidence illustrates that slight deviations in meeting<br />

the assumptions underlying parametric tests may not have radical effects on<br />

the obtained probability levels, there is no general agreement as to what is a<br />

“slight” deviation (21).<br />

An alternative for the parametric tests is the use of non-parametric tests,<br />

which do not assume any distribution for the data but usually have a relatively<br />

low power (21). The assumptions are independent and continuous ordinal<br />

data. A useful non-parametric test is the Kolmogorov–Smirnov test. The<br />

Kolmogorov–Smirnov test determines whether or not the experimental groups<br />

come from the same distribution. Therefore, the data points in each experimental<br />

group are sorted in ascending order, and an empirical distribution<br />

function is calculated without any assumption of distribution or variance. The<br />

Kolmogorov–Smirnov test statistic D is defined as the maximum distance<br />

between the cumulative distributions of two experimental groups (for an<br />

example, see Fig. 5).<br />

D n1n2 = max S n1 X − S n2 X (2)<br />

where S ni (X)=K i /n i K i = number of data equal or less than X; n i : number of<br />

replicates per experimental group.<br />

3.2. Exploratory Multivariate Data Analysis<br />

Univariate statistical tests, such as the T-test, the Kolmogorov–Smirnov<br />

test, ANOVA, or the Kruskal–Wallis test, have not been designed to analyze<br />

complex datasets containing multiple correlated variables. Proteomic datasets<br />

generally contain hundreds of different proteins that are correlated. Proteins fit<br />

within the larger entity of networks and interact with each other. Univariate<br />

statistics test the individual variables one by one and are absolutely not able<br />

to detect correlations to other variables (proteins). Moreover, testing hundreds<br />

of variables (protein spots) one by one and reporting them with an acceptance<br />

of a certain risk of false positives () enhances the chance of reporting<br />

false positive cases (multiple testing issue), and assumes that the different<br />

variables (proteins) are uncorrelated. Proteins are not uncorrelated; they fit<br />

within multiple biological pathways and might have close correlations. The field<br />

of multivariate analysis consists of those statistical techniques that consider two<br />

or more related random variables as a single entity and attempts to produce an<br />

overall result taking the relationship among the variables into account (22). In<br />

contrast to a univariate approach, it displays the inter-relationships between a<br />

large number of variables and is able to correlate multiple proteins to a specific<br />

experimental group. The data from different image analysis software packages<br />

can be exported, introduced, and analyzed using several software packages to


336 Carpentier et al.<br />

perform multivariate analysis. Some commonly used packages are Unscrambler,<br />

Matlab, SAS, and Statistica. GE Healthcare developed a statistical software<br />

package (EDA, extended data analysis) for DIGE approach, which is linked to<br />

the image analysis software Decyder. The package offers both univariate and<br />

multivariate tools. Here, we will discuss mainly the use of Principal Component<br />

Analysis (PCA) (for an overview of other possibilities of EDA package and<br />

more DIGE related statistical examples, see Chapter 6).<br />

3.2.1. Principal Component Analysis<br />

PrincipalComponentAnalysisisoneofthemultivariatepossibilitiestoperform<br />

explorative data analysis. A comprehensive overview of the use of PCA in<br />

statistics is given by Sharma (23). The basics of PCA date back to Karl Pearson<br />

in 1901 (24), and the final procedure as we know it today was developed by<br />

Harold Hotelling in 1933 (25). The use of multivariate methods in the analysis<br />

of 2DE was already established in the early days of 2DE (26) and is an emerging<br />

application in transcriptomics and proteomics (27,28,29,30,31). PCA condenses<br />

the information contained in a huge dataset into a smaller number of artificial<br />

factors, which explain most of the variance observed. The most logical modus<br />

operandi is to consider the different biological replicate samples of the experimental<br />

groups as observations (score plot). The score plot allows the detection of<br />

trends in the samples and the loading plot allows to identify the relevant proteins<br />

that explain the trends. A principal axis transformation transforms the correlated<br />

variables (proteins) into new uncorrelated variables. A principal component<br />

(PC) is a linear combination calculated from the existing variables (proteins)<br />

[PC1 = a 1 (protein1) + a 2 (protein2) +…+a n (protein n);<br />

PC2=b 1 (protein1) + b 2 (protein2) +…+b n (protein n)]. The relation<br />

between the original variables (proteins) and the PCs is displayed in the loading<br />

plot. This means that if a protein has a high loading score for a specific PC,<br />

that protein explains an important part of the sample variance. The starting<br />

point for PCA is the sample covariance matrix. It has been proven that the sum<br />

of the original variances is equal to the sum of the eigenvalues of the sample<br />

covariance matrix. The eigenvalues are the variances of the PCs. The ratio of<br />

each eigenvalue to the total variance indicates the portion of the total variability<br />

accounted for each PC. For the fundamentals of data manipulation and a more<br />

detailed description of the properties and mechanisms of multivariate analysis<br />

and PCA, the reader is referred to the books of Jackson and Sharma (22,23).<br />

It is very important to have an insight into what is calculated and what the<br />

assumptions are of different models. The EDA software offers the user the<br />

choice to play with observations and loadings. Hence, the user also has the<br />

possibility to use the transposed data matrix, and to consider the gel images as


Statistical Analysis of Proteomic Data 337<br />

variables (loading plot) and the proteins as observations (score plot). This might<br />

be helpful to improve the image analysis and to detect protein mismatches,<br />

but should not be used to explore the inter- and intra-group variability of<br />

the biological samples. Explorative PCA does not put strict requirements to<br />

the data. The majority of PCA applications are descriptive in nature. In these<br />

instances, distributional assumptions are of secondary importance (22). The<br />

only requirement that must be met is that the dataset has to be complete,<br />

meaning that there must be no missing spot values among the different samples.<br />

Finding techniques for performing PCA in the absence of complete data and/or<br />

techniques for estimating missing data can solve the problem. Several methods<br />

for estimating missing data have been reported from the microarray community<br />

(32,33,34). A missing value in 2DE proteomics occurs when a spot is detected<br />

in the reference or master gel but not detected in one of the other sample gel<br />

images, or it is detected but not matched to the reference or master gel. The<br />

causes of missing values might be (1) faint spots, flirting with the detection limit<br />

and detected in one gel but not detected in another; (2) mismatches probably<br />

caused by distortions in the protein pattern, or (3) absence of spots due to<br />

bad transfer from the first to the second dimension. Grove et al. show that the<br />

staining procedure was an important source of missing values (27). The concept<br />

of DIGE with its common internal standard anticipates the missing value<br />

problem to some extent by matching the different internal standard images.<br />

A good sample preparation (3) and a good experience in electrophoresis and<br />

proteomic techniques also reduce this problem, but missing values are inherent<br />

to 2DE and must be faced. Some software packages replace the missing values<br />

with the value zero, and others remove all the variables with missing values.<br />

Introducing zeros leaves the results open to serious bias when a protein is<br />

mismatched in a particular sample or when the spot is missing due to a technical<br />

error. This particular protein will get an important loading value for the sample<br />

in question, influencing incorrectly the score for this particular sample. In the<br />

case a protein is really absent or below the detection limit of the staining<br />

method, those missing values can be filled either with zeros or with a threshold<br />

value (35). A better alternative might be to average the samples within an<br />

experimental group and to explore the data based on the group mean. A missing<br />

value will still be considered as a zero and will lower the group mean, but the<br />

impact of loading on the sample score plot is buffered by the average. The<br />

EDA package offers this possibility (see example below). Taking into account<br />

only the proteins that are detected and matched to the master or reference gel<br />

solves the problem of missing values, but a lot of useful information is lost<br />

(see example below). The EDA package offers the possibility to filter the base<br />

dataset and to select only those proteins that are 100% matched. Troyanskaya<br />

et al. show that averaging is an improvement upon replacing missing values


338 Carpentier et al.<br />

with zeros, but it yields drastically lower accuracy than the estimation methods<br />

such as singular value decomposition and weighted K-nearest neighbors (32).<br />

We recommend performing the initial PCA based on the complete dataset<br />

and not based on the proteins that appear to be significantly different from the<br />

individual univariate analyses. Multivariate statistics have an additional value<br />

by being capable of differentiating the different experimental groups in terms of<br />

correlated expression rather than absolute expression (28,36). Both approaches<br />

are complementary. Performing the analysis only on significant proteins from<br />

univariate analysis might disregard useful information. We recommend to<br />

start the analysis with explorative multivariate analysis and to compare the<br />

data subsequently with the confirmatory univariate analysis of the individual<br />

proteins.<br />

3.2.2. Marker Selection<br />

Principal Component Analysis is outstanding in detecting outlying data and<br />

correlations among the different variables (proteins), but it is not able to<br />

determine a threshold level for identifying which proteins are significant in<br />

classifying the experimental groups, allowing an objective removal of variables<br />

(proteins) that do not contribute to the class distinction. Several algorithms<br />

exist to select a subset of features from the whole dataset and to perform a<br />

classification. In proteome analysis, this corresponds to selecting the proteins<br />

that can best discriminate the experimental groups. The use of partial least<br />

squares (PLS) as a regression technique has been promoted primarily within the<br />

area of chemometrics (37). In contrast to PCA, PLS is a supervised technique<br />

mainly applied to link (or regress) a continuous response variable (or dependent<br />

variable) to a set of independent variables (e.g., proteins in a gel). However, in<br />

proteomic data, the response variable is often a discrete variable (e.g., treatment<br />

A, B, C,…) and only takes a fixed number of values. PLS-DA offers an<br />

algorithm to deal with this typical data structure. An analysis of the score and<br />

(correlation) loading plot allows defining the proteins that are important in<br />

discriminating the different experimental treatments. The variable importance<br />

plot (VIP) is an interesting tool for this purpose. According to the user manual,<br />

the PLS algorithm of EDA creates a supervised model of the data (predefined<br />

experimental groups) and then uses the variable influence on the projection<br />

(VIP) scores from the model to create a ranked list of how good a protein<br />

is for discrimination between the experimental groups. Discriminant analysis<br />

(DA) methods, in general, and PLS-DA, in particular, are used to calculate<br />

the probability or accuracy of the marker selection. The purpose of DA is to<br />

permit to assign individual observations (samples) to one of the experimental


Statistical Analysis of Proteomic Data 339<br />

groups [e.g., the classification of patient samples as healthy and tumor based<br />

on protein extractions (38)].<br />

4. Examples<br />

4.1. Classical Dyes, 2 Conditions<br />

In this example, we examine two different conditions, analyse six biological<br />

samples per condition, and perform the analysis with classical CBB staining.<br />

The data have been analyzed with the Image Master Platinum software version<br />

5 (GE Healthcare). Image Master version 5 offers the possibility to compensate<br />

for technical variance and offers intensity calibration and spot normalization. The<br />

relativevolume(%vol)spotnormalizationisthebestspotnormalizationprocedure<br />

because this takes into account the intensity of a spot as well as the area (Eq. 3).<br />

%vol = vol/ n S=1 vol S (3)<br />

where vol S is the volume of spot S in a gel containing n detected spots.<br />

Although this spot normalization procedure reduces the possible technical<br />

variance, it has consequences for the data. Normalizing all the spots transforms<br />

the data and creates an asymmetric population (Fig. 3). A logarithmic<br />

transformation of the data improves the distribution characteristics (Fig. 4).<br />

However, univariate statistical methods are not developed to analyze all the<br />

spots simultaneously like in Figs. 3 and 4. They examine the individual protein<br />

spots (variables) one by one, considering the different proteins as independent<br />

measurements. Therefore, one should consider each spot individually, and the<br />

real population for the experimental groups of this particular protein spot should<br />

be estimated based on the six replicates. Performing distribution tests like the<br />

Levene’s test and the Shapiro-Wilk test on six replicates is a possibility, but<br />

is unlikely that the null hypotheses (normally distributed and homogeneous<br />

variance, respectively) will be rejected. The sample sizes need to be large<br />

enough in order to minimize the amount of false results (i.e., the populations<br />

will appear to be normally distributed and of equal variance although this is<br />

not necessarily the case).<br />

Taking into account the typical heterogeneity of variance associated with<br />

classical dyes, the %vol spot normalization of Image Master, and the limited<br />

sample size, a non-parametric statistical test seems to be the best choice<br />

in this case. We opted here for the non-parametric univariate Kolmogorov–<br />

Smirnov test. The test is one among the options offered by Image Master.<br />

It is a two-sample test with high power efficiency for small sample sizes.<br />

The reduced power of a non-parametric test was anticipated by including a


340 Carpentier et al.<br />

2000<br />

Histogram: Var1<br />

Shapiro-Wilk W = .35883. p = 0.0000<br />

Expected Normal<br />

1800<br />

1600<br />

1400<br />

No. of obs.<br />

1200<br />

1000<br />

800<br />

600<br />

400<br />

200<br />

0<br />

–0.3<br />

–0.1 0.1 0.3 0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9 2.1 2.3<br />

Fig. 3. Distribution of protein spots analyzed by image master and normalized using<br />

the %vol criterion. There is an asymmetrical distribution, with the majority of the spots<br />

lying between 0 and 0.1%.<br />

1000<br />

Histogram: Var2<br />

Shapiro-Wilk W = .98283. p = .00000<br />

Expected Normal<br />

900<br />

800<br />

700<br />

No. of obs.<br />

600<br />

500<br />

400<br />

300<br />

200<br />

100<br />

0<br />

–7 –6 –5 –4 –3 –2 –1 0 1<br />

Fig. 4. A logarithmic transformation of the %vol data of Fig. 3.


Statistical Analysis of Proteomic Data 341<br />

higher number (6) of biological replicates. Figure 5 shows an example of<br />

an individual Kolmogorov–Smirnov test. For the complete experimental setup<br />

and biological background, see Carpentier et al. (39). The options of the Image<br />

Master Platinum software are rather limited and are focused on two experimental<br />

groups. The multivariate analysis offered by Image Master Platinum is<br />

factor analysis. Factor analysis is a technique similar in nature to PCA. The<br />

results of both techniques are quite similar except that factor analysis explains<br />

rather correlations between variables, while PCA explains variability (22). In<br />

Image Master Platinum, the gels (images) are used as loading and proteins<br />

for the score plot. Factor 1 (explaining the majority of the variability) is in<br />

our case associated to protein abundance, and the second factor is associated<br />

with inter-group variability. As stated above, this might be useful to improve<br />

the image analysis and to detect protein mismatches, but to explore the interand<br />

intra-variability of the biological samples, it might be better to export the<br />

A B C<br />

0.8<br />

0.8<br />

0.8<br />

0.6<br />

0.6<br />

0.6<br />

% vol<br />

0.4<br />

0.4<br />

0.4<br />

0.2<br />

0.2<br />

0.2<br />

0<br />

0<br />

0<br />

a b a b c d e f gh<br />

i jkl<br />

kjlg<br />

ih<br />

fe<br />

b a d c<br />

1373<br />

1373<br />

1373<br />

D<br />

0.9<br />

0.8<br />

frequence<br />

0.7<br />

0.6<br />

0.5<br />

0.4<br />

A<br />

B<br />

0.3<br />

0.2<br />

0.1<br />

0<br />

0<br />

0.1<br />

0.2<br />

0.3<br />

0.4<br />

0.5<br />

0.6<br />

0.7<br />

0.8<br />

%vol<br />

Fig. 5. Example of Kolmogorov–Smirnov test. (A) Descriptive statistics displaying<br />

the experimental mean and standard deviation of the two experimental groups (A and B).<br />

(B) Descriptive statistics of the individual biological samples of the two experimental<br />

groups. (C) The data sorted in ascending order. (D) Empirical cumulative distribution<br />

functions of the two experimental groups.


342 Carpentier et al.<br />

data to a statistical program. For an example of classical staining and uni- and<br />

multivariate analysis, see Pedreschi et al. (40).<br />

4.2. DIGE Approach, 4 Conditions<br />

In this example, we are interested in the effects of a specific treatment over<br />

time. Using the DIGE approach, we consider here four time points. At each time<br />

point, three biological samples were analyzed, quantifying several hundreds of<br />

protein spots (i.e., variables) per sample per time point. To process and analyze<br />

the gels, the Decyder software version 6.5 was used in combination with the<br />

EDA module (GE Healthcare). The standardized normalization procedure in<br />

Decyder 2D BVA is based on the concept of having for each gel the Cy2<br />

labeled internal standard image as reference. This standard image is used to<br />

normalize the abundance ratios between the different gels. Decyder offers<br />

the possibility to perform transformation and normalization of the data: log<br />

standardized abundance (Eq. 4).<br />

Log standardized abundance = 10 log vol Cy5 or Cy3/vol Cy2 (4)<br />

Using the DIGE approach, Karp and Lilley gathered reasonable arguments to<br />

assume that the restrictive assumptions of parametric statistics are not violated<br />

too strong after the logarithmic transformation of standardized abundance<br />

(1). The use of parametric statistics seems, therefore, acceptable. However,<br />

univariate statistics test the individual variables one by one and are absolutely<br />

not able to correlate multiple proteins. Moreover, testing hundreds of variables<br />

(protein spots) one by one and reporting them with an acceptance of a certain<br />

risk of false positives () enhances the chance of reporting false positive cases<br />

(multiple testing issue). It is, therefore, advisable to get first an insight in the<br />

complex dataset and to explore the data first via multivariate analysis and<br />

validate the individual differences via univariate statistics. Not all proteins are<br />

relevant to understand the differences between the time points. Therefore, it<br />

would be interesting to distinguish relevant proteins from irrelevant proteins<br />

that do not have a changing abundance over time. To facilitate the discovery<br />

of the differences, we used the PCA of the extended data analysis module of<br />

Decyder. PCA reduces more than 1000 variables into PCs that explain most<br />

of the variance between the treatment times. PCA analysis is not supervised,<br />

meaning that the samples are analyzed without the knowledge of sampling<br />

time. In Fig. 6, the score and loading plot are displayed, taking into account<br />

the two most important PCs. The different repetitions of the same time point<br />

cluster together, and the most important PC (i.e., PC1) is able to separate the<br />

clustered treatment times. In practice, this means that proteins with a high<br />

positive PC1 value will be abundantly present in the 2-day gels and less


Statistical Analysis of Proteomic Data 343<br />

abundant in 14-day gels and vice versa for proteins with a highly negative<br />

PC1 value. Proteins that cluster together have a similar impact on the PCs and<br />

have a similar expression pattern (Fig. 6). This rough approach explains only<br />

a small part of the variability. The first PC explains 34.2% of the variability<br />

and explains a great part of the inter-group biological variability (time effect).<br />

A high positive PC1 value is correlated to 2 days, and a high negative value is<br />

correlated to 14 days. Most proteins cluster around the origin, indicating a poor<br />

contribution to the variance and probably do not change in abundance during<br />

the examined time period. The second PC explains 15.1% of the variability<br />

and seems to explain mainly (technical) intra-group variability. By default<br />

EDA ignores the missing values. By anticipating the missing value issue and<br />

taking the average of each experimental group and reducing some technical<br />

variability, the first component explains 60.9% of the variability and the second<br />

PC 23.4%. Taking into account only the proteins that have been matched and<br />

detected in all the gels reduces the number of examined proteins by more<br />

than 50% and discards very useful proteins that have, for instance, a very low<br />

A<br />

B<br />

Fig. 6. PCA analysis. (A) Score plot. The big circle is based on the Hotellings T 2 -test<br />

statistic and is used to detect outlying observables ( 0.95). The three biological replicates<br />

of the same experimental group cluster together, indicating an acceptable intragroup<br />

variability (grey ellipse). The different experimental groups are also separated,<br />

indicating a certain inter-group variability. There is a clear difference between 2 and 14<br />

days of treatment. (B) The loading plot indicates the correlation between the original<br />

variables. A protein with a high loading score for a specific PC explains an important<br />

part of the sample variance.


344 Carpentier et al.<br />

abundance in the early days of treatment and higher abundances at the end<br />

and vice versa.<br />

As an example, we focus on five proteins that seem highly correlated from<br />

the loading plot (highlighted in Fig. 6B). Confirmatory differential expression<br />

analysis via ANOVA confirms that all five proteins have a very similar<br />

expression pattern over time (Fig. 7). This might suggest a common regulatory<br />

mechanism or an interaction between the proteins. The individual confirmatory<br />

univariate statistics (ANOVA and multiple comparison test) confirm for four<br />

out of the five proteins that 2 days is significantly different from 4 days, 8 days,<br />

and 14 days; and that 14 days is significantly different from 4 days and 8 days<br />

( ≤0.01). We could identify four proteins as lectin isoforms (39), confirming,<br />

indeed at a first level, the correlation between the proteins. One protein could<br />

not be identified and is under further investigation. This protein is likely to<br />

have a common regulatory mechanism (being also a lectin-like protein), might<br />

form a complex, or develop an interaction with lectin proteins. This particular<br />

protein shows exactly the same expression pattern as the four identified lectins,<br />

but the overall ANOVA has a value of 0.0122. This is a nice illustration of<br />

Fig. 7. Confirmatory differential expression analysis—expression pattern of the<br />

individual proteins selected from Fig. 6. The different normalized relative abundances<br />

are displayed for the different time points (14 days, 8 days, 4 days, and 2 days). The<br />

mean of each individual isoform is displayed as a cross.


Statistical Analysis of Proteomic Data 345<br />

how exploratory data analysis is performing, indicating correlation but also<br />

bringing up candidate markers that would have been missed when using only<br />

confirmatory data analysis ( ≤ 0.01).<br />

5. Conclusions<br />

The experimental conditions are important and must be well designed.<br />

Ideally, only biological replicate samples should be used, and one should try to<br />

limit the technical variability to the strict minimum. A reliable sample preparation<br />

and an extended experience in electrophoresis and proteomic techniques<br />

are indispensable. With the low technical variance observed with the DIGE<br />

approach, the need for analyzing technical replicates can be questioned. The<br />

pooling of samples reduces the biological variance to detect changes in protein<br />

abundance between the averages of the experimental groups. Pooling of samples<br />

might be useful but must be reconsidered for each individual experimental setup.<br />

The use of a particular staining method should carefully be considered taking<br />

into account the available lab equipment, budget, and power of a particular<br />

method. The dynamic range of the staining methods and the technical variability<br />

have a great impact on the power of a statistical test and are decisive for<br />

the experimental setup (the number of replicates) and the choice of the statistical<br />

test. Univariate statistics test the individual variables one by one and are<br />

absolutely not able to correlate multiple proteins. Moreover, testing hundreds<br />

of variables (protein spots) one by one and reporting them with an acceptance<br />

of a certain risk of false positives () enhances the chance of reporting false<br />

positive cases (multiple testing issue). Therefore, it is advisable to first get an<br />

insight in the complex dataset and to explore the data via multivariate analysis<br />

and validate the individual differences via univariate statistics. Using a classical<br />

approach with the typical heterogeneity of variance associated with classical<br />

dyes and the limited sample sizes, a non-parametric test seems to be the best<br />

choice. Using the DIGE approach, the restrictive assumptions of parametric<br />

statistics are not violated too strong after the logarithmic transformation of<br />

the standardized abundance. The use of parametric statistics seems, therefore,<br />

acceptable.<br />

Acknowledgments<br />

The authors would like to thank Romina Pedreschi for critical reading and<br />

suggestions and Prof. Verbeke for the sharing of his files. Financial support<br />

from the Belgian National Fund for Scientific Research (FWO-Flanders) is<br />

gratefully acknowledged.


346 Carpentier et al.<br />

References<br />

1. Karp, N. A. & Lilley, K. S. (2005) Proteomics 5, 3105–3115.<br />

2. Urfer, W., Grzegorczyk, M., & Jung, K. (2006) Proteomics S2, 48–55.<br />

3. Carpentier, S. C., Witters, E., Laukens, K., Deckers, P., Swennen, R., & Panis, B.<br />

(2005) Proteomics 5, 2497–2507.<br />

4. Bjellqvist, B., Ek, K., Righetti, P. G., Gianazza, E., Gorg, A., Westermeier, R., &<br />

Postel, W. (1982) J. Biochem. Biophys. Methods 6, 317–339.<br />

5. Westermeier, R. (2001) Electrophoresis in Practice. Wiley-VCH, Weinheim.<br />

6. Westermeier, R. & Naven, T. (2002) Proteomics in Practice. Wiley-VCH,<br />

Weinheim.<br />

7. Rabilloud, T. (2000) Proteome research: two dimensional gel electrophoresis and<br />

identification methods. Springer, Heidelberg.<br />

8. Neter, J., Kutner, M. H., Nachtsheim, C. J., & Wasserman, W. (1996) In:<br />

Applied Linear Statistical Models (Neter, J., Kutner, M. H., Nachtsheim, C. J., &<br />

Wasserman, W., eds.). Irwin, Chicago, pp. 958–1010.<br />

9. Neter, J., Kutner, M. H., Nachtsheim, C. J., & Wasserman, W. (1996) In:<br />

Applied Linear Statistical Models (Neter, J., Kutner, M. H., Nachtsheim, C. J., &<br />

Wasserman, W., eds.). Irwin, Chicago, pp. 1121–1164.<br />

10. Karp, N. A., Spencer, M., Lindsay, H., O’dell, K., & Lilley, K. S. (2005) J.<br />

Proteome Res. 4, 1867–1871.<br />

11. Patton, W. F. (2000) Electrophoresis 21, 1123–1144.<br />

12. Westermeier, R. (2006) Proteomics S2 61–64.<br />

13. Switzer, R. C., Merril, C. R., & Shifrin, S. (1979) Anal. Biochem. 98, 231–237.<br />

14. Rabilloud, T., Vuillard, L., Gilly, C., & Lawrence, J. (1994) Cellular and Molecular<br />

Biology 40, 57–75.<br />

15. Unlu, M., Morgan, M. E., & Minden, J. S. (1997) Electrophoresis 18, 2071–2077.<br />

16. Alban, A., Currie, I., Lewis, S., Stone, T., & Sweet, A. C. (2002) Mol. Biol. Cell<br />

13, 407A–408A.<br />

17. Alban, A., David, S. O., Bjorkesten, L., Andersson, C., Sloge, E., Lewis, S., &<br />

Currie, I. (2003) Proteomics 3, 36–44.<br />

18. Tonge, R., Shaw, J., Middleton, B., Rowlinson, R., Rayner, S., Young, J.,<br />

Pognan, F., Hawkins, E., Currie, I. et al. (2001) Proteomics 1, 377–396.<br />

19. Neter, J., Kutner, M. H., Nachtsheim, C. J., & Wasserman, W. (1996) In:<br />

Applied Linear Statistical Models (Neter, J., Kutner, M. H., Nachtsheim, C. J., &<br />

Wasserman, W. eds.). Irwin, Chicago, pp. 95–152.<br />

20. Gustafsson, J. S., Ceasar, R., Glasbey, C. A., Blomberg, A., & Rudemo, M. (2004)<br />

Proteomics 4, 3791–3799.<br />

21. Siegel, S. C. N. J. (1988) Non Parametric Statistics for Behavioral Sciences.<br />

McGraw-Hill Book Company, Singapore.<br />

22. Jackson, J. E. (2003) A User’s Guide to Principal Components. Wiley, New York.<br />

23. Sharma, S. Applied Multivariate Techniques. Wiley, Hoboken, NJ.<br />

24. Pearson, K. (1901) Phil. Mag. Ser. B. 2, 559–572.<br />

25. Hotelling, H. (1933) J. Educ. Psychol. 24, 417–441.<br />

26. Tarroux, P. (1983) Electrophoresis 4, 63–70.


Statistical Analysis of Proteomic Data 347<br />

27. Grove, H., Hollung, K., Uhlen, A. K., Martens, H., & Faergestad, E. M. (2006) J.<br />

Proteome Res. 5, 3399–3410.<br />

28. Marengo, E., Robotti, E., Bobba, M., Liparota, M. C., Rustichelli, C., Zamoo, A.,<br />

Chilosi, M., & Righetti, P. G. (2006) Electrophoresis 27, 484–494.<br />

29. Schultz, J., Gottlieb, D. M., Petersen, M., Nesic, L., Jacobsen, S., & Sondergaard, I.<br />

(2004) Electrophoresis 25, 502–511.<br />

30. Verhoeckx, K. C. M., Gaspari, M., Bijlsma, S., Van Der Greef, J., Witkamp, R. F.,<br />

Doornbos, R. P., & Rodenburg, R. J. T. (2005) J. Proteome Res. 4, 2015–2023.<br />

31. Gottlieb, D. M., Schultz, J., Bruun, S. W., Jacobsen, S., & Sondergaard, I. (2004)<br />

Phytochemistry 65, 1531–1548.<br />

32. Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R.,<br />

Botstein, D., & Altman, R. B. (2001) Bioinformatics 17, 520–525.<br />

33. Scheel, I., Aldrin, M., Glad, I. K., Sorum, R., Lyng, H., & Frigessi, A. (2005)<br />

Bioinformatics 21, 4272–4279.<br />

34. Oba, S., Sato, M., Takemasa, I., Monden, M., Matsubara, K., & Ishii, S. (2003)<br />

Bioinformatics 19, 2088–2096.<br />

35. Wood, J., White, I. R., & Cutler, P. (2004) Signal Process. 84, 1777–1788.<br />

36. Karp, N. A., Griffin, J. L., & Lilley, K. S. (2005) Proteomics 5, 81–90.<br />

37. Wold, S. (1985) Encyc. Stat. Sci. 6, 581–591.<br />

38. Nguyen, D. V. & Rocke, D. M. (2002) Bioinformatics 18, 39–50.<br />

39. Carpentier, S. C., Witters, E., Laukens, K., Van Onckelen, H., Swennen, R., &<br />

Panis, B. (2007) Proteomics 7, 92–105.<br />

40. Pedreschi, R., Vanstreels, E., Carpentier, S., Robben, J., Noben, J. P., Swennen, R.,<br />

Lammertyn, J., Vanderleyden, J., & Nicolaï,B.M. Proteomics 7, 2083–2099.


18<br />

Web-Based Tools for Protein Classification<br />

Costas D. Paliakasis, Ioannis Michalopoulos, and Sophia Kossida<br />

Summary<br />

Current proteomics technologies generate large number of data among which the investigator<br />

has to identify the promising diagnostic/prognostic biomarkers as well as potential<br />

therapeutic targets. For the latter, classification of proteins into meaningful families is<br />

needed. Current databases, featuring a high level of interconnectivity (cross referencing),<br />

provide the tools necessary to bring various data together, facilitating protein classification<br />

and elucidation of protein function and interoperativity. This chapter provides guidelines<br />

to explore the informationally rich peptide sequences generated by the application of the<br />

proteomics methodologies by the use of web-based tools, with the objective to predict<br />

potential protein function. After proper preprocessing (e.g., for internal repeats) of a query<br />

protein sequence, known domains can be identified, which aid in dividing the query into<br />

smaller meaningful parts. Any unclassified remainder of the protein provides the material<br />

for low-level comparative analysis for the discovery of distant homologues or candidate<br />

novel domain types to be verified experimentally.<br />

Key Words: protein classification; domain families; recurrent tertiary structural<br />

motifs; sequence–structure relationships; (protein) structural evolution; protein database;<br />

homology searches; domain inference; protein structure redundancy.<br />

1. Introduction<br />

From the times of the “one man-one gene” approach, when individuals were<br />

working on single protein sequences, which were decoded from the corresponding<br />

DNA sequences, to the era of high-throughput techniques, when<br />

massive automated procedures produce large numbers of peptide sequences,<br />

one task remains virtually the same: individual protein sequences need classification.<br />

We, humans, have an amazing instinctive capability to categorize<br />

From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />

Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />

349


350 Paliakasis et al.<br />

objects, even the most complex ones, which in particular can be categorized<br />

along various kinds of natural or arbitrary schemes. Proteins feature<br />

multiple attributes, such as sequence, structure, function, organelle specificity,<br />

evolutionary origin, affinity, isoelectric point, and size (not to mention tissue<br />

specificity and antigenicity in higher organisms), all of which offer means for<br />

classification. For instance, 2D gel spots corresponding to proteins, which have<br />

been separated in terms of their size and isoelectric point, reflect a primary<br />

attempt for classification; affinity (e.g., nucleoprotein, lipoprotein, metalloprotein,<br />

etc.) and function (e.g., enzyme, carrier) offer another basis for classification,<br />

both relating to the chemistry of a protein, and basic spectroscopic<br />

data, like those of circular dichroism (which suggest an estimate of the relative<br />

amounts of -stranded vs. -helical structure), permit classification to the all-,<br />

all- or mixed / classes. However, classification schemes based on general<br />

attributes (e.g., the physicochemical properties of proteins) suffer from heterogeneity<br />

within their classes. For instance, a number of otherwise unrelated<br />

proteins can be classified as “metalloproteins.”<br />

In general, two requirements with opposing effects should be satisfied by<br />

any classification scheme: specificity, which leads to particularization (i.e., a<br />

higher number of narrower classes) and abstraction, which leads to generalization<br />

(i.e., a smaller number of wider classes). In the end, a comprehensive<br />

and useful hierarchy is a trade-off between specificity and abstraction (i.e.,<br />

the most general classes possible that are still useful in some desired way).<br />

Proteins, the structures of which represent successful solutions to the problem<br />

of thermodynamic stability and at the same time can accommodate a biologically<br />

useful function, provide the basis of all kinds of radiant variation at the<br />

level of protein sequence (and consequently function). Each protein variant,<br />

that survives the evolutionary pressure of competition against other potential<br />

variants, has emerged after a series of modifications of various extents; an<br />

explanation is presented later on why this is the preferred mode of action.<br />

Common ancestry classification schemes provide the specificity necessary to<br />

define sensible protein classes, in contrast to those classification schemes,<br />

which follow general features. In the former, all members of each class<br />

share a common tertiary structure across very wide evolutionary spans, while<br />

similarities at the level of amino acid sequence remain exploitable, even in<br />

cases where they are hard to detect. Therefore, evolution-based classification<br />

schemes are not driven by our natural impulse to categorize objects drawing<br />

arbitrary borderlines, but reflect basic principles of the protein nature. In<br />

fact, classification with respect to evolutionary history and structure comes<br />

so naturally, that when function is not preserved, we tend to refer to a<br />

“-like” form within the same family of proteins, rather than to a different<br />

family.


Web-Based Tools for Protein Classification 351<br />

Protein sequences derived from a common ancestor by divergent evolution,<br />

share a high degree of similarity (both with each other and naturally with their<br />

ancestor, although the latter may be unknown). This similarity persists over<br />

quite a wide evolutionary span, before it is worn out by divergence and rendered<br />

undetectable by direct pair-wise sequence alignments. Conveniently, it is highly<br />

unlikely that proteins without common evolutionary origin share a high degree<br />

of similarity; in fact, the higher the similarity the more recent the speciation.<br />

It will be shown how these nearest relatives provide the guidelines to identify<br />

the features that are crucial for the definition of a family of proteins, before<br />

the detection of the most remote relationships is attempted. In conclusion,<br />

the amino acid sequence offers a highly specific key to classification, albeit<br />

intermediary members, and structure may need to be consulted, before any<br />

remote members of a class can be detected.<br />

The evolution-based classification schemes, as well as the tools available<br />

over the web to explore them, constitute the subject of the following notes.<br />

Many researchers in the relevant fields tend to take simple homology searches<br />

and domain assignment tools for granted, until an unexpected outcome sheds<br />

doubt and confusion; it is the authors’ intention that by the end of this chapter,<br />

the reader will be capable to conduct those (otherwise routine) tasks with a<br />

higher degree of both awareness and confidence.<br />

2. Materials<br />

The procedure of protein classification comprises several more or less<br />

independent steps. Although these steps have been arranged (in the present<br />

notes) in the order they are usually employed, this order can change, depending<br />

on the nature of information available at each point. Steps can also be omitted,<br />

if they are unnecessary or their target has already been accomplished (although<br />

performing them will provide further reassurance). Each of the steps described<br />

is a small protocol in each own right; a number of web tools – some of them in a<br />

number of variations – implement each of those steps. However, improvement<br />

of user friendliness on one hand and users’ skills on the other has rendered the<br />

procedure to look like a single protocol; in fact, sometimes automation hides<br />

a number of steps of which only the results can be viewed, in the form of a<br />

compiled web page. Instead of listing the websites of all relevant tools, a small<br />

and comprehensive selection of entry points is suggested in Table 1, via which<br />

a wealth of tools is then accessible. All of those websites provide user friendly<br />

interfaces. It is suggested that the reader browses (and gets familiar with) at<br />

least those main websites, before attempting to delve deeper into the realm of<br />

web-based analysis tools.


352 Paliakasis et al.<br />

Table 1<br />

Main Entry Points to the World Wide Web for Protein Classification<br />

ExPASy<br />

www.expasy.org<br />

A wide range of software tools for the analysis of protein sequences and<br />

structures as well as 2D PAGE, can be found here. It also offers an entry<br />

point to a rich collection of other web sites, mainly the SwissProt/UniProt<br />

databases<br />

BLAST<br />

www.ncbi.nlm.nih.gov/BLAST<br />

A convenient starting point for on-line search of sequence databases (both<br />

protein and DNA ones). Many other sites feature some version of BLAST as<br />

well<br />

EnsEMBL<br />

www.ensembl.org<br />

A collection of complete genomes, which offers an entry point from a different<br />

view – that of a genome rather than that of a sequence<br />

Pfam<br />

www.sanger.ac.uk/software/pfam<br />

A collection of profiles of protein families against which a sequence can be<br />

matched, for initial domain recognition<br />

Protein data bank<br />

www.pdb.org and www.rcsb.org<br />

The archive of experimentally determined 3D-structures (by crystallography,<br />

NMR, and other techniques) of biological macromolecules (proteins, nucleic<br />

acids, sugars, etc.)<br />

InterPro<br />

www.ebi.ac.uk/interpro<br />

An effort to integrate information from several diverse sources to a unified<br />

comprehensible form<br />

3. Methods<br />

3.1. Theoretical Issues: Classification Based on Sequence or<br />

Structure<br />

The specifics that define a set of sequences as a protein family (i.e., molecular<br />

function and involved amino acid residues, other kinds of sequence fingerprints,<br />

post-translational modification, etc.) have to be accommodated within<br />

a structural framework Fig. 1. However, 3D structure is not reserved for one<br />

protein family. In fact, there seems to be a countable set of spatially local<br />

packing arrangements between -helices and -sheets, which, when combined,


Web-Based Tools for Protein Classification 353<br />

Fig. 1. Complex shapes can be misclassified by a general property like size, because<br />

of small (or larger) parts missing in relation to the simplest forms from which they derive.<br />

More specific (“shape-related”) attributes can bring all stars (and parts thereof) together,<br />

as they can do with triangles, squares, and circles. Once a proper overall scheme is<br />

in place, general attributes (like color) can then detail the distribution within each class.<br />

lead to 3D structural assemblages, stable in terms of thermodynamics and<br />

useful in terms of function (1). The participant elements may be distant along<br />

the sequence or they may even belong to different chains. The small number<br />

of packing options leads to the occurrence of common 3D structural themes,<br />

termed the recurrent tertiary motifs, e.g., “up-and-down” helical bundles, -<br />

barrels, etc. Descriptions at this level of abstraction take into account neither the<br />

sequential order of the helices and strands nor their length. Tertiary structural<br />

domains in proteins of unrelated evolutionary origin (or function) with apparently<br />

unrelated sequences, may adopt the same tertiary motif (usually including<br />

further 3D structural elements [(2) see also Note 1]. It can be claimed that<br />

the abstract idea of a recurrent tertiary motif leans toward the basic packing<br />

arrangements,whereastheimplementeddomainsareclosertotheproteinfamilies.<br />

The 3D environment of certain positions on the structure (a different set<br />

of positions for each recurrent tertiary structural motif) poses physicochemical


354 Paliakasis et al.<br />

5-vdef sNIR[enpvtpwnpeps]<br />

: * : * + : *+<br />

R1: A PVID PT AYID PE ASVI G<br />

R2: E VTIG AN VMVS PM ASIR S[degm]<br />

R3: P IFVG DR SNVQ DG VVLH A[letineegepiednivevdgkey]<br />

R4: A VYIG NN VSLA HQ SQVH G<br />

R5: P AAVG DD TFIG MQ AFVF -<br />

R6: K SKVG NN CVLE PR SAAI -<br />

R7: G VTIP DG RYIP AG MVVT - <br />

-------------------------<br />

CNS: a VfIG DN vyIa pQ AvVh(g|s) (Consensus)<br />

BS#1 T1 BS#2 T2 BS#3<br />

Fig. 2. The seven repeats that form the -helix in MT-CA demonstrate the level of<br />

the impact that structure can have on sequence. The -strands (groups of four residues)<br />

are shown, separated from the intervening “turns” (groups of two). The turns that<br />

connect successive repeats are split–one residue at the left end and a second one, which<br />

is missing in some cases, at the right end. Parts of the sequence in square brackets<br />

[] are intervening connecting loops; the part in angle brackets follows this core<br />

motif and is not part of the repeat sequence. The His residues that coordinate the Zn<br />

atom are underlined, and stem from positions (within the repeat) marked by a plus<br />

sign (+). A partial repeat (every six positions) has been proposed on the basis of other<br />

sequences that adopt this structure; the positions marked by stars (*) correspond to main<br />

positions in this (partial) repeat, and the ones marked by colon (:) correspond to the<br />

secondary ones. No repetition of this kind (i.e., every six positions) is apparent for any<br />

other positions, leaving the 17–18 residues long repeat unit as the only complete one.<br />

Positions Asn10–Arg12 (top row) form a small extension the -sheet #3; preceding<br />

residues are shown for completeness and only to emphasize that the repeat does not<br />

extend in them. In the consensus, drawn at the bottom row, the main ingredients of the<br />

repeat unit are shown in capital letters.<br />

requirements, which can be best met usually by one or a few amino acid types),<br />

thus defining a scale of preferences (3). These preferences are reflected onto<br />

patterns that may arise at the level of the primary sequences (that adopt the<br />

relevant recurrent tertiary motifs), whenever these spatially defined positions<br />

are close along the sequence Fig. 2. It should be noted, that these patterns are<br />

reflections along the sequence of the abstract tertiary theme and that they are<br />

much more general than the detailed protein family-specific sequence fingerprints.<br />

Simplified lattice models suggest that a small number of 3D structural<br />

motifs set loose requirements that can be met by a large number of sequences,<br />

along their evolutionary pathway (4). In this case, nature appears to reuse a


Web-Based Tools for Protein Classification 355<br />

successful structural solution in evolutionarily unrelated sequences (see Note<br />

2). On the other end, a large number of 3D structural motifs pose requirements<br />

so manifold and exact that only a few sequences can be compatible with them.<br />

The resultant patterns of preferences along the sequence appear occasionally<br />

strong enough to permit structural motif prediction from the sequence alone (5).<br />

It can be claimed that no more than 200 recurrent tertiary structural motifs<br />

(the exact number depending on the stringency of their definition) provide the<br />

structural basis of perhaps 95% of the nonredundant set of protein structures<br />

(2). The average residue coverage is a much smaller figure due to the need<br />

of additional structural elements to complete a domain. Vice versa, a large<br />

number of tertiary structural motifs are so rare, that they provide the basis of the<br />

small remaining proportion of protein structures (see Note 3). Detailed specialization<br />

into families takes place within this structural framework: Chothia (6)<br />

has long ago estimated that 95% of the protein information to be discovered<br />

will derive from no more than 1000 protein families. In fact, for a substantial<br />

(and growing) proportion of any newly identified protein sequences, enough<br />

information already exists in the databases to build a 3D model (7). The<br />

reason for this lies on a simple fact: during the creation of new protein<br />

families, the relatively small number of structural alternatives directs nature<br />

to a strong preference for the reuse of already successful solutions at the<br />

level of sequence (not structure), especially when similar problems are to be<br />

solved rather than discovering new ones, on the basis of the same or different<br />

structure. The traits being inherited along reuse of sequences are usually the<br />

ones to be exploited in protein classification. On the other hand, this small<br />

set of structural motifs, the ones easily accessible to protein families of irrelevant<br />

origin and/or function, occasionally leads otherwise unrelated proteins to<br />

elevated sequence similarity scores (which sometimes appear too high to be<br />

explained by chance), just because they fold in the same manner (see Note 4).<br />

The traits being developed (as opposed to being inherited) reflect convergent<br />

evolution.<br />

Protein structure has also served as the basis of classification in some<br />

schemes. However, the theoretical considerations, which have been discussed<br />

herein (in particular, the fact that unrelated proteins may fold in the same way),<br />

hint that classification on the basis of 3D structure alone, will tend to be on a<br />

coarser scale. On the other hand, the availability of detailed structural data for a<br />

(preferably representative) member of a protein family, experimentally derived<br />

by means of X-ray crystallography or NMR spectroscopy, besides all kinds of<br />

facilitation reserved for other procedures (e.g., structure-based protein design),<br />

offers a valuable aid in sequence-based classification. It provides a very solid<br />

ground to assess any sequence-based classification, and a great tool to detect<br />

the most remote members. However, unless classifying protein structure per se


356 Paliakasis et al.<br />

(rather than proteins in their entirety), it appears that a common structural architecture<br />

alone is not sufficient evidence to classify proteins in the same class.<br />

Evolutionarily refined variants of tertiary structural domains, “similar-yetdifferent”<br />

within a given repertoire, appear in different combinations with those<br />

of other repertoires: a domain for a different cofactor or regulatory factor<br />

(e.g., GDP vs. ADP) may be combined with a catalytic domain for a slightly<br />

different substrate (fructose vs. glucose). Thus, the most complicated and best<br />

tuned series of (simpler) functions, necessary for life, can be accomplished<br />

in a spatially ordered and life efficient manner. On the other hand, this fact<br />

makes essentially imperative that any classification proceeds up to terms of<br />

domains: it suffices to describe any sequence in question, as comprising of “an<br />

N-terminal domain of type X and a C-terminal domain of type Y, joined by a<br />

loop region of type Z,” otherwise, extensive subtyping and the “Russian doll”<br />

effect (see Note 5) will soon be confronted.<br />

In practice, the classification procedure starts in the form of the detection<br />

of some similarity between a protein (or part thereof) and a prototype (e.g.,<br />

a profile extracted from a multiple alignment or a structure through which<br />

it is threaded), which is too high to explain by chance alone. The tools to<br />

demonstrate this similarity are presented under the Subheading 3.2, in any case,<br />

it will be the network of similarities within a set of data (sequences, structures,<br />

etc), which will clarify the underlying reason for the observed similarity.<br />

3.2. The Practical Side<br />

It cannot be stressed enough that most protein sequences are nowadays translations<br />

of relevant nucleic acid sequences. It is important to identify cDNA<br />

originals if possible, to ensure that the employed nucleic acid sequence corresponds<br />

to protein in a reliable way. When the original data are supplied in<br />

the form of genomic DNA fragments, introns could still be included and alternative<br />

splicing remains a possibility. Current gene recognition programs like<br />

GeneScan (8), normally expected in genome-oriented databases like EnsEmbl<br />

(9) (see Note 6), can efficiently detect and remove introns, but errors may still<br />

infiltrate. If this is the origin of the protein data, certain precautions should be<br />

taken:<br />

• Search for relevant proteins with reliable sequences, e.g., by means of a preliminary<br />

Basic Local Alignment and Search Tool (BLAST) (10) search against SwissProt (11).<br />

• Align the sequence of interest to any trustworthy matches and observe the pattern<br />

of conservation. Sudden insertions to the sequence in question (especially ones with<br />

highly biased composition, short tandem repeats or repetitions of other parts of the<br />

protein, especially partial ones, etc) do not necessarily represent extra features or<br />

minidomains; deleted parts may have been mistakenly considered to be introns.


Web-Based Tools for Protein Classification 357<br />

• Isolate “candidate” insertions and try to find similar sequences in the databases; see<br />

if any trustworthy match makes sense in terms of biology.<br />

• Alternatively, try finding a protein in the Protein Data Bank (PDB) (12), which<br />

is similar (even remotely) to the one in question (excluding the insert), and has<br />

its 3D structure experimentally known (see Note 7). The location of the candidate<br />

insertion/deletion on the structure may verify or reject it.<br />

• Parts of the query protein matching expressed sequence tags (ESTs) (13) provide an<br />

extra source of verification (see Note 8): a part matching an EST is an expressed part.<br />

Other criteria may apply to verify the integrity of a processed putative gene.<br />

For example, if the protein has been biochemically characterized, then any<br />

experimentally observed property must match the ones of the sequence that is<br />

predicted by the gene (or have a good reason why it does not).<br />

Another very serious issue is the fact that many annotations are automatically<br />

transferred between similar sequences of the same or different databases.<br />

Even SwissProt entries are crowded with annotations assigned “by similarity.”<br />

The number of proteins with primary annotations is many orders of magnitude<br />

smaller than the number of annotated sequences in the current databases.<br />

These annotations should be considered as hints that can direct experiments to<br />

promising routes rather than secure data.<br />

3.2.1. Preprocessing the Query<br />

A preliminary check up of the protein sequence itself is recommended.<br />

Repeats and parts of low complexity are of particular interest.<br />

3.2.1.1. REPEATS<br />

Regularities in biological macromolecular structure (like the helical nature<br />

of DNA or the super-coiled structure of some protein assemblies) and multimerization<br />

create room for repetitions along the protein sequences. Repeats can<br />

range in length from a few amino acid residues to complete domains (e.g., as<br />

a result of domain duplication).<br />

In the latter case, the repetition count is usually small, just two to three<br />

copies (14) although much higher counts do occur. When catalytic domains are<br />

repeated, the situation may have no ground on structural regularities; it may<br />

for instance reflect a need for efficiency (e.g., cooperativity between different<br />

copies of a domain). In database searches for multidomain protein queries,<br />

it is anyway recommended to treat different domains separately, for reasons<br />

explained later on; the difference here lies in the fact that the separate copies<br />

can be aligned, and their consensus (or profile) can be extracted and serve as<br />

the query.


358 Paliakasis et al.<br />

On the other hand, short tandem repeats (e.g., about 10 amino acid residues<br />

long or shorter) normally reflect some structural regularity. In a dot-plot<br />

style alignment of a protein sequence to itself they manifest themselves as a<br />

(moderate-to-high) number of tracks, which run parallel to the main diagonal<br />

(and to each other) in a regular manner (Fig. 3). Since combinations of parts<br />

coming from different tracks produce significant alternative alignments, procedures,<br />

which attempt to report all possible alternative alignments between two<br />

proteins will be severely confounded (see Note 9 on BLAST in particular).<br />

A consensus or a profile may be extracted again by a proper alignment of<br />

the repeats. However, statistically significant matches cannot be expected for<br />

a resultant query of (say) 6 or 12 amino acid residues long. One possible cure<br />

is to concatenate a small number of repeats, to produce a query no longer than<br />

50 amino acid residues (see Note 10 on why 50). The small number of repeats<br />

(e.g., four repeats of length 11) helps avoiding the explosion of alternatives,<br />

although a few of them will not be completely avoided. If this step is taken, it is<br />

suggested that the output of a dot-plot utility (such as DOTLET, a Java-based<br />

hosted in ExPASy server; Table 1) is consulted, at all times.<br />

3.2.1.2. Parts of Low Complexity<br />

Low complexity occurs when some part of the sequence comprises only<br />

a few types of amino acid residues, leading database queries to nonspecific<br />

results (see Note 11); the situation can be even worse if some of these types<br />

are similar to each other. In general, it is important to know beforehand any<br />

significant deviations of the composition in types of amino acid residues, as<br />

well as the presence of special features such as signal peptides or groupings<br />

of biologically relevant charged side chains (see Note 12). Relevant search<br />

procedures, like BLAST (10), detect stretches of low complexity and offer<br />

to ignore them during the search; however, what appears to be a part of low<br />

complexity may be e.g., a transmembrane stretch. The action to take depends<br />

on both the importance and the position of the stretch:<br />

• If a single transmembrane part makes sense (or is known to exist), the extra- and<br />

intracellular moieties can be separate queries.<br />

• A signal peptide (especially when located at the extreme of the N-terminus) usually<br />

can be excluded from the procedure, profitably or at least without problem.<br />

• A stretch of low complexity, which appears to be of no special significance in terms<br />

of structure/function/evolution, can be best left to the search procedure to mask it.<br />

Relevant tools are available from the Web (e.g., the ExPASy site). Alternatively,<br />

a simple dot-plot style alignment of the protein sequence can be run vs. itself.<br />

Besides repeats, this will reveal areas of low complexity as square blocks of<br />

elevated average score, symmetrical around the main diagonal (Fig. 3). If low


Web-Based Tools for Protein Classification 359<br />

(A)<br />

(B)<br />

Fig. 3. Continued


360 Paliakasis et al.<br />

complexity occurs within the boundaries of a repeat, similar square blocks will<br />

appear around relevant parallel off-diagonal tracks.<br />

3.2.2. Inference of Domains<br />

In the spirit of the theoretical analysis earlier in this chapter, classification<br />

can take the form of assigning parts of the sequence to domains. Hence, using<br />

a domain inferring tool like the ones offered by Pfam (15) and SMART (16)<br />

should be among the first steps for classification of a protein, based on its<br />

sequence (see Note 13). This information serves to divide the sequence of<br />

interest into pieces and handle them separately (see Note 14).<br />

Given the high coverage achieved by those collections (more than 75% of<br />

the proteins have at least one domain recognized by them, and in average about<br />

two-thirds of the length of a protein can be described this way) (15), some<br />

protein sequence classification efforts end here (see Note 15). In fact, database<br />

search procedures should be soon expected to exploit high-level features, which<br />

will be extracted from the query and relevant sequences, resorting to amino<br />

acids alone, only for parts where the attempts will fail.<br />

3.2.3. Querying Other Databases<br />

Despite the current high coverage of protein sequences in terms of known<br />

domains, parts of these sequences still elude. These parts may simply be<br />

too distant members of the families they belong to, and they have failed the<br />

thresholds of automatic procedures. Those parts should be isolated, properly<br />

preprocessed (mainly for compositional biases), and queried against SwissProt<br />

and PDB.<br />

• Entries (records) in SwissProt (11) offer rich annotation and crossreferences to a<br />

number of resources, all in a mainly human readable form and via a nice user<br />

friendly interface on top. The high level of curation (including annotation derived<br />

by similarity) will save duplicate efforts and may provide valuable hints on how to<br />

move on.<br />

◭<br />

Fig. 3. (Continued) (A) Schematic representation of a dot-plot style alignment of<br />

a protein against itself; to depict the special cases presented in the text, the protein<br />

is supposed to feature two copies of some domain, a low complexity N-terminus and<br />

a C-terminal part dominated by some short internal repeat, except for a tail, which<br />

appears unique. (B) Alignment of a small part (from a real protein) of low complexity<br />

against itself. The situation here is worse than suspected, because the few types of<br />

amino acid residues are related to each other (alanine to valine and glycine; to proline<br />

and serine in lesser extent).


Web-Based Tools for Protein Classification 361<br />

• Search for similar sequences in PDB (12) will reveal experimentally determined<br />

3D structures of protein instances, possibly related (e.g., through evolution) to the<br />

protein of interest. A 3D structure offers a model (even before a model of the query<br />

sequence is built, following this information) to think on, a toy on which to visualize<br />

and handle data in far more efficient ways (see Note 16).<br />

If domains are inferred by the relevant procedures (or supplied by SwissProt<br />

annotation) and/or long stretches (say 30–40 amino acid residues or longer) of<br />

special behavior are observed, it is a good idea to handle each sequence part<br />

separately, or in small meaningful combinations, for instance, there may be no<br />

reason to treat, say, a propeptide separately from the main body of the domain<br />

it belongs to (see Note 17 and 18).<br />

If a few top hits of a database search can be aligned to the query with<br />

confidence, and the next ones are marginal (see Note 18), the output of a<br />

multiple alignment of the best hits (including the query) should be converted<br />

to some kind of profile [e.g., a position-specific scoring matrix (PSSM)] and<br />

the database should be scanned for the resulting profile (see Note 19). The<br />

marginal hits of the initial query (i.e., the protein of interest) that match positions<br />

conserved throughout the profile will have their statistical significance increased<br />

and they will surface. If domain inferring programs can detect some kind<br />

of domain on those (initially marginal) hits, this information can then be<br />

transferred to the initial query with confidence (recall: the query is part on<br />

which no domain was detected).<br />

The few top hits will be sometimes marginal (see Note 18). Each of the<br />

“best” marginal hits should be used as a query and a number of homologues<br />

(about 10; see Note 20) should be collected and aligned without the initial<br />

query (i.e., protein of interest). Some kind of profiles (e.g., a PSSM) should<br />

be produced by those alignments and the relevant part of the initial query (i.e.,<br />

protein of interest) should be aligned against them. If the initial query matches<br />

the profile at conserved positions (see Note 21), the hit was not fortuitous.<br />

Again, if domain inferring programs can detect some kind of domain along the<br />

sequences that formed the profile, this information can then be transferred to<br />

the initial query with confidence.<br />

Other databases provide annotation at high level on specific tasks. InterPro<br />

(17) offers a convenient entry point to a number of them, especially for manual<br />

sequence classification (as opposed to some massive automated procedure).<br />

SuperFamily (18) builds information based on classification of 3D structures (a<br />

hit here implies structural similarity regardless of common function or evolutionary<br />

origin), PRINTS (19) and PROSITE (20) and one may continue with a<br />

long list, where each member targets a specified problem (e.g., if the protein of<br />

interest is found to be a peptidase, MEROPS (21) may be consulted for further<br />

relevant classification).


362 Paliakasis et al.<br />

4. Notes<br />

1. It is just often a simple operation (e.g., a function) that is built by (part of) the<br />

sequences as 3D domains. For instance, there are tertiary structural domains,<br />

which simply bind a cofactor and feature an allosteric position, where some<br />

regulatory factor (e.g., ADP) will dock to exert its role. The active site may<br />

reside on a separate domain, or may be shared between two of them, within the<br />

range of the cofactor.<br />

2. Unpublished work (C.D.P., Ph.D. thesis) in continuation of (3) suggests that the<br />

requirements set – albeit too vaguely – by an -helical “up-and-down” bundle,<br />

which is an abundant tertiary structural motif, raise the relevant parts of the<br />

sequence to the extreme 0.1–1% of a suitable distribution, when proteins in<br />

a databank are scored for compatibility. This shift is not enough for structure<br />

prediction from the sequence alone (too many false positives), but it still reflects<br />

a possibly minimal set of requirements posed by the structure for compatible<br />

sequences.<br />

3. There is a tendency to treat the observed structural solutions, i.e., the recurrent<br />

tertiary structural motifs and domains, as the end evolutionary product of our<br />

days. In fact, all the preceding evolutionary steps (as well as the future ones,<br />

probably) had to employ one of the solutions provided in this relatively narrow<br />

set. If we depict this set, so that similar architectures are close to each other,<br />

then “evolution” is a “walk” through this set. Whether this set is continuous or<br />

partitioned in a discontinuous manner, is the subject of ongoing research.<br />

4. A continuum is thus established in the scale of similarities between protein<br />

sequences, on one end, the small biases due to simple facts (e.g., two transmembrane<br />

pieces are coincidentally matched); remote similarities due to common<br />

structural architecture, in the middle of the scale; and on the other end, 30% (or<br />

more) identity observed due to common origin of a protein from a mammal to<br />

a bacterial homologue (and, usually, more than 80%, e.g., between mammals,<br />

etc.).<br />

5. This effect characterizes the situation in which a particular domain includes a<br />

smaller one, plus some extra structural elements (“decorations”); then, the new<br />

total constitutes part of a larger domain, which includes some further structural<br />

elements, and so on. Orengo and coworkers (2) have presented a number of<br />

examples in their series of papers on classification of protein structure.<br />

6. The version of BLAST featured in EnsEmbl can run against the results of<br />

GeneScan; this does not simply translate genomic DNA into Opening Reading<br />

Frame (ORFs) before comparison, but it also attempts to “splice” it, after<br />

predicting and removing potential introns. Other task-specific databases feature<br />

relevant tools.<br />

7. The version of BLAST at the National Center for Biotechnology Information<br />

(NCBI) has access to all protein sequences of known structure. Alternatively,<br />

the PDB resource (Table 1) can be directly accessed for this purpose, losing<br />

however the interconnection to other databases offered by NCBI.<br />

8. Like in the previous Note 7, access by means provided by NCBI is recommended.


Web-Based Tools for Protein Classification 363<br />

9. For example, BLAST seeks all the instances where a small part from the<br />

query matches the protein of interest. Then to form longer alignments, BLAST,<br />

depending on its version, either expands these “seed-alignments” to contiguous<br />

subalignments, uninterrupted by gaps, which are then joined in all valid combinations,<br />

or expands the seeds in a gapped alignment fashion. The presence<br />

of short repeats may make the output particularly hard to follow, due to the<br />

numerous alternatives.<br />

10. Sander and Schneider (22) suggest that the minimum percentage of identity<br />

between two proteins, which is required to imply structural similarity converges<br />

to about 27% for common alignment length of about 80 amino acids. However,<br />

the change in the range of 50–80 is small to justify inclusion of further repeats,<br />

which would increase the number of alternative alignments. See also Note 18.<br />

11. For instance, assume that a stretch, about 20 amino acid long or longer, is<br />

dominated by leucine, isoleucine, and perhaps a couple of phenylalanines. Not<br />

only will this part be nonspecifically matched to any sequence that features a<br />

similar deviation in composition, but the resulting alignment will also appear<br />

unstable in this part, because of the numerous and almost equivalent alternative<br />

ways in which two stretches of the kind can be aligned.<br />

12. For example, a large deviation toward lysine and alanine will make the sequence<br />

look like a histone. Scanning a databank for similar peptide sequences, the<br />

results will tend to include nonspecific stretches rich in positive (and negative<br />

to a lesser extent) charges, in general.<br />

13. The NCBI/BLAST Server (Table 1) offers CDD (conserved domain databank),<br />

which is based on both Pfam and SMART, further including collections internal<br />

to NCBI. Other servers may offer similar compilations. However, for detailed<br />

inquiries one may need to resort to the original resources. The information<br />

presented by the original collection can be much richer. Furthermore, each<br />

specialized collection offers tools for flexible searches in terms of combinations<br />

of various domains, to help detect proteins of similar architecture, reference<br />

similarities to other related domain, and so on.<br />

14. The fact that tertiary structural domains tend to behave independently should be<br />

exploited. Bench work can usually be facilitated by studying isolated domains,<br />

e.g., if some part of a protein makes the molecule hard to crystallize, the relevant<br />

information (if available) could indicate which part to remove. Information<br />

derived using domain inferring tools can serve to divide a sequence of interest<br />

into meaningful pieces.<br />

Bioinformatics work may as well get similar profits, e.g., during databases<br />

search: assume for example that a protein includes a general hydrolase domain<br />

(e.g., an esterase), which is found in many combinations with other domains,<br />

which particularize its use; and it also contains a domain, which is specific for<br />

the family this sequence belongs to. It will be the latter that will boost the most<br />

relevant sequences to the top of the sorted list of BLAST results; accordingly,<br />

it will be the one to drive the query protein to the correct subfamily within the<br />

framework of a larger family.


364 Paliakasis et al.<br />

15. In the case of multidomain proteins, each hit to a constituent domain (or a<br />

significant part of it), signifies the existence of a related part in the databank.<br />

Occasionally, some domains will seem apparently missing: either the relevant<br />

part of the sequence appears deleted or an expected domain is not recognized<br />

along it. Given the statistical nature of the recognition procedure and the<br />

nucleotide nature of underlying primary data, the tempting conclusion that this<br />

domain/part is not present, is by no means secure.<br />

• If the relevant part of the sequence is present, you may check whether<br />

domains, which were recognized by domain inference programs along remote<br />

homologues of this part, can be transferred by means of alignment involving<br />

preformed multiple alignments, as described in Subheading 3.2.3 for the case<br />

of remote hits.<br />

• If the relevant part of the sequence seems absent, then despite the efficiency<br />

of genetic data manipulation procedures, parts of the sequence may have been<br />

accidentally considered as introns. Once some major part of a multidomain<br />

protein has been located on the complete genome, the hits should serve as<br />

pointers to the location to search more carefully at. Perhaps the next generation<br />

of data-mining will perform this retro-search of missing parts automatically<br />

(like the iterative BLAST is performed today). Until then, and in spite of<br />

the times of high-level annotation (which will retrieve the major part of the<br />

information being hunted) one should be ready for straightforward TBLASTN<br />

of minor parts of the sequence in hand to rule out their existence conclusively<br />

and beyond reasonable doubt.<br />

16. When an experimentally determined 3D structure for a similar sequence exists<br />

in PDB, then the sequence of interest and the matching structure can be input<br />

to some automated model building server (like the SwissModel Server; some<br />

servers may also need a ready made alignment between the two) and get a<br />

3D structural approximation of the query protein. If nothing else, inspection<br />

of this model will explain any mutational data available and will reveal key<br />

locations for experimentation by means of site-directed mutagenesis and other<br />

kinds of modification and querying (instead of blind trials along the sequence),<br />

in order to infer the mechanism of function or other valuable information. If<br />

the quality of the alignment is poor, but both the sequence and the structure<br />

can be aligned to e.g., a profile, this intermediary link can mediate alignment<br />

between the protein of interest and the distantly related sequence of known<br />

structure. Alternatively the remote match may serve as the query to retrieve<br />

further sequences homologous to the hit, in order to align the original query to<br />

their preformed multiple alignments, as it is described under Subheading 3.2.3.<br />

17. The expectancy value (E-value) provided with the sorted hit list by BLAST<br />

depends on the product of the length of the database by the length of the<br />

query. Assuming that matching counterparts exist for just one of the domains<br />

and that this domain comprises a small part of the total protein, BLAST may


Web-Based Tools for Protein Classification 365<br />

miss matching hits of marginal similarity, just because the length product was<br />

unnecessarily (thanks to domain independence) too large.<br />

18. The expectancy value should be regarded as only a rough measure. It would<br />

be a more accurate measure of the expected number of hits, if databases were<br />

nonredundant (i.e., they contained absolutely nonhomologous sequences) and<br />

there were no biases toward specific types of amino acid residues or toward<br />

sequence patterns (e.g., the amphipathic ones met in -helices, which account<br />

for about one quarter of protein structure in general). Besides, Sander and<br />

Schneider (22) have long shown that as soon as a subalignment of a given size<br />

exceeds a relevant level of identity, 3D structural similarity can be assumed,<br />

independently of the length of the proteins which participate in the comparison<br />

or the number of sequences which the query is compared to. They suggest a<br />

threshold t(L) = 290.15 × L 0562 for L < 80 and about 27% for L > 80; cases<br />

with identity level higher than t(L) assume related structure, allowing only a<br />

small acceptable number of false positives. Alignments lying at the lower side<br />

of the line as this derives from the equation mentioned above, do not necessarily<br />

signify proteins of unrelated structure. For them, structural similarity, if existant,<br />

cannot be simply asserted with confidence. Similarity is rendered more and more<br />

improbable as the relevant figures decrease.<br />

19. Details on how to make or use a PSSM may change with implementation. It<br />

is worth spending some time on the on-line help offered on PSSM under their<br />

implementation at the NCBI. In any case, Clustal (23) may be used to align a<br />

sequence to a block of prealigned sequences, or even to two preformed multiple<br />

alignments. In both cases, if conserved positions in the “reference” block are<br />

conserved along the query sequence (or the query block) the match is reliable.<br />

Pfam (15) offers the tools for another approach involving hidden Markov model,<br />

the explanation of which is beyond the scope of the present notes.<br />

20. Following the results of Henikoff and Henikoff (24,25), it seems that about 10<br />

homologues are usually already enough, with the reservation that they should<br />

cover, if possible, all the range of similarities from 90% down to 40–30%. If<br />

all of them are too similar to each other, it will be as if the same sequence was<br />

included 10 times. If all of them are too dissimilar to each other, then the risk<br />

of mistakes in their multiple alignment will be too high.<br />

21. As a reassurance, in case that a hit is correct, some of the sequences that are<br />

homologous to the hit should have appeared in the hit list of the initial search<br />

(i.e., the one in which the protein of interest was the query sequence). If just<br />

one protein from a large family was reported, chances are that the hit was<br />

coincidential.<br />

References<br />

1. Richardson J.S. and Richardson D.C. (1989) “Principles and patterns of protein<br />

conformation.” In: Fasman G. (ed) “Prediction of Protein Structure and the<br />

Principles of Protein Conformation.” Plenum Press, NY, pp 1–98.


366 Paliakasis et al.<br />

2. Orengo C.A. and Thornton J.M. (2005) “Protein families and their evolution – a<br />

structural perspective.” Annu. Rev. Biochem. 74, 867–900.<br />

3. Paliakasis C.D. and Kokkinidis M. (1992) “Relationships between sequence and<br />

structure for the four--helix bundle tertiary motif in proteins.” Protein Eng. 5,<br />

739–748.<br />

4. Lattman E.E., Fiebig K.M. and Dill K.A. (1994) “Modeling compact denatured<br />

states in proteins.” Biochemistry 33, 6158–6166.<br />

5. Lupas A., vanDyke M. and Stock J. (1991) “Predicting coiled-coils from protein<br />

sequences.” Science 252, 1162–1164.<br />

6. Chothia C. (1992) “One thousand families for the molecular biologist.” Nature<br />

357, 543–544.<br />

7. Schwede T., Kopp J., Guex N. and Peitsch M.C. (2003) “SWISS MODEL:<br />

an automated protein homology modeling server.” Nucleic Acids Res. 31,<br />

3381–3385.<br />

8. Burge C. and Karlin S. (1997) “Prediction of complete gene structures in human<br />

genomic DNA.” J. Mol. Biol. 268, 78–94.<br />

9. Hubbard T., Andrews D., Caccamo M., et al. (2005) “Ensembl 2005.” Nucleic<br />

Acids Res. 33, D447–D453.<br />

10. Altschul S.F., Madden T.L., Schäffer A.A., Zhang J., Zhang Z., Miller W. and<br />

Lipman D.J. (1997) “Gapped BLAST and PSI-BLAST: a new generation of protein<br />

database search programs.” Nucleic Acids Res. 25, 3389–3402.<br />

11. Bairoch A., Apweiler R., Wu C.H., Barker W.C., Boeckmann B., Ferro S.,<br />

Gasteiger E., Huang H., Lopez R., Magrane M., Martin M.J., Natale D.A.,<br />

O’Donovan C., Redaschi N. and Yeh L-S.L. (2005) “The universal protein resource<br />

(UniProt).” Nucleic Acids Res. 33, D154–D159.<br />

12. Berman H.M., Westbrook J., Feng Z., Gilliland G., Bhat T.N., Weissig H.,<br />

Shindyalov I.N. and Bourne P.E. (2000) “The protein data bank.” Nucleic Acids<br />

Res. 28, 235–242.<br />

13. Boguski M.S., Lowe T.M.J. and Tolstoshev C.M. (1993) “dbEST – database for<br />

expressed sequence tags.” Nature Genet. 4, 332–333.<br />

14. Apic G., Gough J. and Teichman S.A. (2001) “Domain combinations in archaeal,<br />

eubacterial and eukaryotic proteomes.” J. Mol. Biol. 310, 311–325.<br />

15. Bateman A., Coin L., Durbin R., Finn R.D., Hollich V., Griffiths-Jones S., Khanna<br />

A., Marshall M., Moxon S., Sonnhammer E.L.L., Studholme D.J., Yates C. and<br />

Eddy S.R. (2004) “The Pfam protein families database.” Nucleic Acids Res. 32,<br />

D138–D141.<br />

16. Letunic I., Copley R.R., Pils B., Pinkert S., Schultz J. and Bork P. (2006) “SMART<br />

5: domains in the context of genomes and networks.” Nucleic Acids Res. 34,<br />

D257–D260.<br />

17. The InterPro Consortium; Mulder N.J., Apweiler R., Atwood T.K., et al. (2005)<br />

“InterPro, Progress and Status in 2005.” Nucleic Acids Res. 33, D201-D205.<br />

18. Madera M., Vogel C., Kummerfeld S.K., Chothia C. and Gough J. (2004) “The<br />

SUPERFAMILY database in 2004: additions and improvements.” Nucleic Acids<br />

Res. 32, D235-D239.


Web-Based Tools for Protein Classification 367<br />

19. Attwood T.K., Bradley P., Flower D.R., Gaulton A., Maudling N., Mitchell A.L.,<br />

Moulton G., Nordle A., Paine K., Taylor P., Uddin A. and Zygouri C. (2003)<br />

“PRINTS and its automatic supplement, preprints.” Nucleic Acids Res. 31, 400-402.<br />

20. Hulo N., Bairoch A., Bulliard B., Cerutti L., de Castro E., Langendijk-Genevaux<br />

P.S., Pagni M. and Sigrist C.J.A. (2006) “The PROSITE database.” Nucleic Acids<br />

Res. 34, D227-D230.<br />

21. Rawlings N.D., Morton F.R. and Barrett A.J. (2006) “MEROPS: the peptidase<br />

database.” Nucleic Acids Res. 34, D270–D272.<br />

22. Sander C. and Schneider R. (1991) “Database of homology-derived protein structures<br />

and the structural meaning of sequence alignment.” Proteins: Struct. Fun.<br />

Gen. 9, 56–68.<br />

23. Thompson J.D., Higgins D.G. and Gibson T.J. (1994) “CLUSTAL W: improving<br />

the sensitivity of progressive multiple sequence alignment through sequence<br />

weighting, positions-specific gap penalties and weight matrix choice.” Nucleic<br />

Acids Res. 22, 4673–4680.<br />

24. Henikoff S. and Henikoff J.G. (1992) “Amino acid substitution matrices from<br />

protein blocks.” Proc. Natl. Acad. Sci. USA 89, 10915–10919.<br />

25. Henikoff S. and Henikoff J.G. (1993) “Performance evaluation of amino acid<br />

substitution matrices.” Proteins Struct. Fun. Gen. 17, 49–61.


19<br />

Open-Source Platform for the Analysis of Liquid<br />

Chromatography-Mass Spectrometry (LC-MS) Data<br />

Matthew Fitzgibbon, Wendy Law, Damon May, Andrea Detter, and<br />

Martin McIntosh<br />

Summary<br />

The analysis of protein mixtures by liquid chromatography-mass spectrometry (LC-<br />

MS) requires tools for viewing and navigating LC-MS data, locating peptides in LC-MS<br />

data, and eliminating low-quality peptides. msInspect, an open source platform, can carry<br />

out these steps for single experiments and can align and normalize peptide features<br />

in comparative studies with multiple LC-MS runs. In addition, msInspect can analyze<br />

quantitative studies with and without isotopic labels to generate peptide arrays.<br />

Key Words: liquid chromatography-mass spectrometry; peptide identification;<br />

filtering; alignment; quantitation.<br />

1. Introduction<br />

msInspect is an open-source platform comprising algorithms and visualization<br />

tools that process liquid chromatography-mass spectrometry (LC-<br />

MS) data files to locate peptides in two dimensions [time and mass over<br />

charge (m/z)] and perform various analyses on them (1). msInspect can be<br />

used for:<br />

• Visually inspecting LC-MS spectra and peptide features<br />

• Automatically locating peptide features in high mass accuracy MS spectra<br />

• Filtering peptide features by various quality measures<br />

• Quantitating label-free peptide features between experiments via alignment and<br />

normalization of the data to create a peptide array<br />

From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />

Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />

369


370 Fitzgibbon et al.<br />

• Identifying isotopically labeled pairs [e.g., isotope coded affinity tagging (ICAT),<br />

sable labeling with amino acids in cell culture (SILAC)] for quantitative peptide<br />

analysis within a single experiment<br />

• Comparing and developing MS feature-finding algorithms<br />

msInspect implements multiple algorithms specifically designed for LC-MS<br />

data.<br />

The signal processing component exploits the two-dimensional nature of<br />

the data to identify coeluting isotopes and then groups them based on the<br />

similarity of the observed isotopic distributions to those of naturally occurring<br />

peptides. The alignment method estimates the underlying nonlinear mapping of<br />

retention times between experiments. The normalization approach (2) adapts<br />

methods developed for genomic arrays to accommodate natural variation of<br />

LC-MS signal intensities across runs. Ultimately, the goal of msInspect is to<br />

mine LC-MS data and to produce peptide arrays that can then be analyzed<br />

using tools traditionally applied to genomic arrays msInspect also contains<br />

a complete Accurate Mass and Time (AMT) analysis workflow (3). These<br />

analytical techniques combine LC-MS and LC-MS/MS data in order to expand<br />

peptide coverage and enhance the confidence of peptide identifications.<br />

2. Materials<br />

To run msInspect the Java Runtime Environment must be installed. To<br />

perform alignment of multiple runs, the R environment must also be installed.<br />

Both of these programs must be properly configured and on the computer’s<br />

PATH. Information on acquiring these software packages is provided in<br />

Subheading 2.1 below. Please contact your local IT systems support group for<br />

details on installing these software properly.<br />

msInspect reads mass spectra from files in the open mzXML format (4). For<br />

background on mzXML and information about converting data from particular<br />

instruments to mzXML see Note 1.<br />

2.1. Software<br />

1. msInspect is written in platform-independent Java and requires that the Java<br />

Runtime Environment, version 1.5 or later, be installed and on the computer’s<br />

PATH. Installation of Java Runtime Environment will also install the latest<br />

version of Java Web Start, which will allow msInspect to be run without<br />

needing to explicitly install it or update it as new versions are released<br />

(see Note 2).<br />

a. Windows, Linux, and Solaris users can download “J2SE 5.0” from<br />

http://java.sun.com/j2se/1.5.0/download.jsp.<br />

b. MacIntosh users running Mac OS X v10.4 or later can download Java from<br />

http://www.apple.com/support/downloads.


Open LC-MS Analysis Platform 371<br />

2. To align multiple runs into a peptide array, the R environment for statistical<br />

computing, version 2.1.0 or later, must be installed and on the computer’s PATH.<br />

R executables for various operating systems are available from http://www.rproject.org.<br />

2.2. Hardware<br />

msInspect will run on any computer that supports the software listed in<br />

Subheading 2.1. For large input files, typical of high mass accuracy measurements,<br />

feature extraction can require several hundred megabytes of memory<br />

(see Note 3). msInspect has been tested on computers running Windows XP,<br />

GNU Linux, and Mac OS X with at least 1 GB of main memory.<br />

2.3. Data Files<br />

msInspect will open any version 2.0 mzXML file containing MS1<br />

data. However, msInspect was designed using high-resolution liquid<br />

chromatography-electrospray ionization-time of flight mass spectrometer data<br />

so it may not perform as well with an mzXML file from another type of<br />

mass spectrometer (e.g., a matrix-assisted laser desorption-time of flight mass<br />

spectrometer).<br />

Sample mzXML files that may be used to follow all of steps in Section 3<br />

are available on the Web (see Note 4).<br />

3. Methods<br />

3.1. <strong>View</strong>ing and Navigating LC-MS Data<br />

1. Launch msInspect from http://proteomics.fhcrc.org/download/tools/msInspect/<br />

viewer.jnlp by clicking on “Launch msInspect with Java Web Start.” “Fred<br />

Hutchinson Cancer Research Center” must be accepted as a trusted software<br />

publisher for the download to be completed.<br />

2. Upon launching msInspect, the Open File dialog box will automatically open.<br />

Browse for the mzXML file to be viewed, select the file, and left click the<br />

Open button (see Note 5). You may load a different mzXML file by selecting<br />

File > Open from the main msInspect menu bar.<br />

3. The msInspect window (Fig. 1) contains several panes for viewing and navigating<br />

the MS run:<br />

a. An image of the MS run will be displayed in the Image Pane (the largest<br />

pane in the center of the msInspect window).<br />

b The Properties Pane (left side of the window) will display detailed information<br />

from the mzXML file loaded. This pane will later be used to<br />

display details of individual peptide features. It can be hidden with<br />

Windows > Show/hide properties.


372 Fitzgibbon et al.<br />

c. The Detail Pane is on the right side of the window and the Chart Pane is<br />

at the bottom part of the window. Each provides a more detailed view of a<br />

region of the spectrum. The Detail Pane provides a zoomed view of the area<br />

selected in the full Image Pane. The Chart Pane plots intensity versus m/z<br />

(to show the isotopes in a single scan) or intensity versus scan (to show the<br />

elution profile of a single isotope).<br />

4. Hold the mouse cursor over a location in the Image Pane. A floating tag will<br />

appear displaying the scan number and m/z coordinates of that position.<br />

5. Areas containing peptide features in the Image Pane will appear dark. Left click<br />

in a dark area of the image where there appear to be many peptide features as<br />

shown in Fig. 1.<br />

a. The Detail Pane (right) shows a detailed view of the area selected. Feature<br />

finding is automatically launched in this area, and after a few seconds of<br />

computation, detected peptide features are circled. Xs indicate the monoisotopic<br />

peaks in each feature (see Note 6).<br />

b. To see detailed information about a detected peptide, position the mouse<br />

cursor over the monoisotopic peak. A floating tag will display scan<br />

number, m/z (followed by mass in parentheses), inferred charge state,<br />

Fig. 1. msInspect window showing the Properties Pane (top left), Image Pane (top<br />

center), Detail Pane (top right), and Chart Pane (bottom).


Open LC-MS Analysis Platform 373<br />

intensity/background intensity/median intensity, and the first and last scan<br />

for the feature.<br />

c. The Chart Pane (bottom) displays the m/z spectrum for the scan corresponding<br />

to the vertical red line in the Detail Pane.<br />

6. Zoom in on features in the Chart Pane by highlighting a desired area. To do<br />

this, anchor the mouse cursor by left clicking at the top left corner of the desired<br />

area and continue to hold down the left mouse button while dragging the mouse<br />

cursor down and to the right. When the mouse button is released, the chart will<br />

be redrawn to produce a magnified view of the selected area (see Note 7). To<br />

restore the original chart, left click on the mouse cursor anywhere in the Chart<br />

Pane and drag the cursor up or to the left.<br />

7. Select “elution” from the drop-down menu at the top of the Chart Pane to display<br />

an elution profile plot. This display shows peaks along the scan axis rather than the<br />

m/z axis. Note that the Detail Pane now displays a horizontal line corresponding<br />

to the m/z value for the profile as shown in Fig. 2.<br />

8. Zoom in on the Image Pane by right clicking on the mouse and selecting a<br />

magnification value from the list (e.g., 200%).<br />

Fig. 2. msInspect window displaying an elution profile plot in the Chart Pane and<br />

corresponding horizontal line in the Detail Pane.


374 Fitzgibbon et al.<br />

3.2. Locating Peptides in LC-MS Data<br />

A Feature Set file, which lists all of the peptide features detected in a run,<br />

can be generated using one of the algorithms included in the platform (see<br />

Note 8).<br />

1. Under the Tools menu, select two dimensional (2D) Peak Alignment. This is the<br />

default feature-finding algorithm and is recommended for most purposes.<br />

2. To initiate feature finding, select Tools > Find All Features. This will bring up<br />

the Extract Features dialog box as shown in Fig. 3.<br />

3. In the “Save Features to File” field, enter (or browse for) a path and add a name<br />

for the new Feature Set file.<br />

4. Specify a scan range in the “Start Scan” and “End Scan” fields to limit feature<br />

finding to a subset of scans. By default, msInspect will attempt to find peptides<br />

in all scans (see Note 9).<br />

5. Left click the Find Features button to begin the feature finding process. As the file<br />

is processed, the status bar at the bottom of the msInspect window will display<br />

progress. For a large input file, processing may take upwards of 20–30 min.<br />

6. When processing is complete, features will be written to the specified<br />

output file and highlighted as colored crosses in the Image and Detail<br />

Panes. The status bar will display “Finding features complete. See file<br />

yourfilepath\yourfile.peptides.tsv.” Place the mouse cursor over one of the<br />

detected features to display a summary of its properties. Left click on the feature to<br />

view details in the Properties Pane (display by Windows > Show/hide Properties).<br />

7. Select Tools > Display Peptides… to open the Display Features dialog box as<br />

shown in Fig. 4A for customization:<br />

Fig. 3. Extract Features dialog box.


Open LC-MS Analysis Platform 375<br />

(A)<br />

(B)<br />

Fig. 4. Continued


376 Fitzgibbon et al.<br />

a. Display or hide the colored crosses by checking or unchecking the box under<br />

the “Display” field.<br />

b. Change the color of the crosses by left clicking on the colored box under<br />

the “Color” field. A new color can be selected from a color palette.<br />

c. <strong>View</strong> the Feature Set browser by left clicking on the “…” button. This<br />

browser lists details of all peptides in the Feature Set. This list can be sorted<br />

and edited, comments can be added to a feature, features can be deleted, and<br />

the modified Feature Set file may be saved (see Note 10).<br />

3.3. Filtering to Eliminate Low-quality Peptides<br />

Low-quality peptides can be removed in msInspect by applying userspecified<br />

filtering criteria (e.g., a minimum number of isotopic peaks detected).<br />

Removing low-quality peptides is particularly helpful when peptide arrays are<br />

to be generated (described in Subheading 3.4.1).<br />

1. Select Tools > Display Peptides….<br />

2. Left click the Filter tab at the bottom of the Display Features dialog box. This<br />

tab displays several parameters by which features can be filtered.<br />

3. Set Min Charge = 1, Min Scans = 3, Min Intensity = 5, Max KL = 1.0, and Min<br />

Peaks =2asshown in Fig. 4A (see Note 11).<br />

4. Left click the Apply button. The Detail Pane now shows only the features that<br />

meet these filtering criteria.<br />

5. Save the filtered Feature Set file over the original file by left clicking on the “…”<br />

button at the top right of the Display Features dialog box, then left clicking on<br />

the Save button.<br />

3.4. Quantitation of Peptide Features<br />

3.4.1. Quantitation Using Label-free Approaches<br />

Features from multiple experiments can be compared in msInspect by simultaneously<br />

opening Feature Set files from multiple LC-MS runs, displaying them<br />

together, and generating a peptide array. Below are directions for multiple LC-<br />

MS run comparisons after Feature Set files have been produced (as described<br />

above in Subheadings 3.1–3.3) for all LC-MS runs to be compared.<br />

1. Select Tools > Display Peptides….<br />

2. Left click on the Add Files button (Fig. 4A).<br />

◭<br />

Fig. 4. (A) Display Features dialog box with one file loaded and the Filter tab<br />

selected. (B) Display Features dialog box with two files loaded and the Peptide Array<br />

tab selected.


Open LC-MS Analysis Platform 377<br />

3. Browse to find another Feature Set file (with file extension.peptide.tsv) and open<br />

it. A different colored cross is assigned in the Image Pane to the features from<br />

each newly opened file. In this way, multiple Feature Set files can be opened and<br />

overlaid in the Image Pane (see Note 12).<br />

4. Left click on the Filter tab (Fig. 4A) at the bottom of the Display Features<br />

dialog box and make sure the filter criteria are still set to the values entered<br />

in Subheading 3.3 (Min Charge = 1, Min Scans = 3, Min Intensity = 5, Max<br />

KL = 1.0, and Min Peaks = 2). Left click on the Apply button if any changes are<br />

made.<br />

5. Left click on the Peptide Array tab (Fig. 4B) to set criteria for the peptide array<br />

to be generated:<br />

a. Enter a name for peptide array file that will be generated. By convention,<br />

this file name should end with “.pepArray.tsv.”<br />

b. Click the Optimize button to have msInspect search for reasonable tolerances<br />

for matching features across runs (see Note 13).<br />

c. Check the Normalization box if normalization of features is desired (2).<br />

d. Click the Calculate button to actually compute the peptide array.<br />

6. The generated peptide array file consists of one column of intensities for each run<br />

and one row for each matched feature. The file is stored in a simple tab-delimited<br />

format, which can be exported (to Excel and other programs) and analyzed using<br />

tools traditionally applied to genomic arrays (see Note 14).<br />

3.4.2. Quantitation Using Isotopic Labeling<br />

A common method of relative quantitation of peptides involves applying<br />

heavy and light isotopic labels separately to two samples, then mixing them<br />

prior to collecting LC-MS data. Typically, tandem MS/MS (or MS2) experiments<br />

are used to analyze these labeled samples. Peptide sequencing in<br />

MS/MS can detect the number of labeled residues in each peptide and therefore<br />

determine the expected mass difference between light and heavy forms of each<br />

peptide.<br />

msInspect can perform relative quantitation even in the absence of MS/MS<br />

information. Provided with the mass of the light and heavy reagents and with a<br />

threshold on the number of labeled residues to consider, msInspect will search<br />

for pairs of features consistent with isotopic labeling.<br />

1. Open the file to be analyzed as described in Subheading 3.1.<br />

2. Select Tools > Find All Features.<br />

3. This will again bring up the Extract Features dialog box as shown in Fig. 3.<br />

Enter a new output file name and select a scan range of interest as described in<br />

Subheading 3.2.3–3.2.4.<br />

4. Note the “Quantitate” check box in this dialog. Selecting this box will enable<br />

several options for relative quantitation.


378 Fitzgibbon et al.<br />

5. Select one of several common isotopic labeling strategies (e.g., Cleavable ICAT<br />

and O 16 /O 18 ) from the pull-down menu. Details can be entered including masses<br />

for light and heavy label reagents, the particular amino acid labeled, and the<br />

maximum number of labeled residues to consider.<br />

6. Left click on the “Find Features” button to locate all features in the specified<br />

scan range. Display features from the Feature Set file as described in Subheading<br />

3.2.7. An additional matching step is performed to locate isotopically labeled<br />

pairs. A pair is indicated by a vertical bar connecting the light and heavy partners<br />

in the Detail Pane. Selecting a pair by left clicking in the Detail Pane will display<br />

feature properties including the light and heavy intensities, the ratio of light to<br />

heavy, and the number of isotopic labels detected.<br />

7. The results of this quantitation process are stored in a tab separated value (TSV)<br />

file specified in step 3.4.2.3. One record is written for each isotopically labeled<br />

pair and for each unlabeled peptide (see Note 15).<br />

4. Notes<br />

1. More information on the mzXML file format, as well as utilities to convert<br />

native acquisition files from many common MS instruments to mzXML, can be<br />

found on the Sashimi website at http://sashimi.sourceforge.net.<br />

2. Running msInspect via Java Web Start is highly recommended for casual use,<br />

as it greatly simplifies installation and update of the software. msInspect’s<br />

major features, such as feature finding and peptide array creation, are available<br />

from the command line as well, and command-line use is more appropriate<br />

for batch processing of large numbers of mzXML files. To use msInspect<br />

from the command line, the stand-alone JAR file can be downloaded from<br />

http://proteomics.fhcrc.org/CPL/msinspect.html. This web page also allows<br />

download of the msInspect user’s guide, which contains detailed instructions on<br />

installation, using msInspect’s features from the command line, and full source<br />

code for the released version (5).<br />

3. Feature extraction can require a great deal of memory since it operates on several<br />

scans at a time. By default the Java Web Start version of msInspect allows up to<br />

384 MB of memory to be allocated so that a number of scans and intermediate<br />

results may be cached. If additional memory is available on the computer, the<br />

amount of memory accessible by msInspect may be increased when running<br />

msInspect from the command line with the “-Xmx” option when invoking Java.<br />

For example “java –Xmx512M –jar viewerApp.jar.”<br />

4. Sample data files are available at https://proteomics.fhcrc.org/CPAS. From that<br />

website, follow the “Published Experiments” link on the lower left side and<br />

then left click on the “MiMB Clinical Proteomics” link on the left side. Because<br />

LC-MS files can be quite large, the samples provided for download are only<br />

small subregions of the files used as figures in Section 3. Some browsers, such<br />

as Internet Explorer, may add a “.mzXML.xml” suffix when downloading these


Open LC-MS Analysis Platform 379<br />

files. This should not affect msInspect’s ability to read the files and may be<br />

safely modified to “.mzXML” if desired.<br />

5. The first time a particular mzXML file is loaded, msInspect will write a “.inspect”<br />

file in the same directory where the mzXML file is located. This file contains an<br />

index of each scan in the original file, which will speed subsequent file access.<br />

Construction of this index file can take some time for larger input files; the<br />

status bar at the bottom of the msInspect window will indicate progress.<br />

6. The area shown in the Detail Pane is indicated in the main Image pane by a blue<br />

rectangle. Several aspects of Detail Pane behavior can be adjusted by selecting<br />

Detail Pane Settings from the Tools menu. There, feature detection can be turned<br />

on or off, background noise that falls below a threshold can be hidden, and the<br />

color scheme of the Detail Pane can be modified.<br />

7. Note that in Fig. 1 the Chart Pane clearly shows individual isotopic peaks<br />

because the data is from a high-resolution instrument (in this case a Waters<br />

LCT Premier). msInspect depends on resolving individual isotopes to infer the<br />

charge state of the peptide and therefore its mass. The charge is derived from<br />

the reciprocal of the distance between adjacent peaks. In Fig. 1 the peaks of<br />

the peptide on the left side of the Chart Pane are 0.5 m/z units apart, therefore<br />

msInspect infers that this peptide has a charge of 2. It is not possible to infer a<br />

charge for a single peak, so “stray peaks” that cannot be grouped into an isotopic<br />

cluster are assigned a charge of zero.<br />

8. msInspect includes a number of feature extraction algorithms, which can be<br />

selected in the Tools menu. The default, two dimensional (2D) peak alignment,<br />

is recommended for most purposes. The single scan algorithm may be useful<br />

if there is little or no scan-to-scan coherence. The feature extraction algorithms<br />

in msInspect have been designed to work on high-resolution profile mode data.<br />

The algorithms have been successfully applied to centroided data, but performance<br />

will depend on the particular centroiding algorithm used and on the noise<br />

characteristics of the run under consideration. For such data, the centroided scan<br />

algorithm may be appropriate.<br />

9. Once peptides have been located, some amount of visual curation is recommended.<br />

The Heat Map view (accessed from the Tools menu) can provide a<br />

global view of features grouped by charge state and sorted by various metrics<br />

such as mass or intensity. Each column in the Heat Map view consists of a<br />

small intensity window around each feature, colored from low intensity (red) to<br />

high intensity (yellow). Clicking on a feature in the Heat Map will highlight it<br />

in the other windows. By sorting on KL score or intensity and inspecting a few<br />

features, one can gain a sense of what filtering criteria might be appropriate for a<br />

given data set. When new filter settings are applied, as described in Subheading<br />

3.3, the Heat Map view is automatically updated.<br />

10. A typical example of editing a Feature Set file:<br />

a. Sort by ascending KL score (Left click on the “KL” column header).<br />

b. Find a feature with KL < 1 that was misidentified by examining its spectrum<br />

in msInspect window’s Chart Pane.


380 Fitzgibbon et al.<br />

c. Double click in the Description field for the feature to add a comment to<br />

the Feature Set List noting that this feature is “questionable.”<br />

d. Click “Save” to save changes by overwriting the old Feature Set file.<br />

11. Filtering peptide features can improve the performance of subsequent steps<br />

such as construction of peptide arrays. Specific filtering criteria will depend on<br />

instrumentation and the experiment goals. The most frequently used filtering<br />

criteria include:<br />

a. Minimum charge – msInspect locates features by first finding peaks and<br />

then grouping them into isotopic distributions consistent with individual<br />

peptides. Some peaks will not group with any others and are referred to as<br />

“stray peaks.” As described in Note 7, it is not possible to infer the charge<br />

state of these stray peaks, so they are assigned a charge of zero. Setting the<br />

minimum charge to 1 when filtering will remove these stray peaks, which<br />

are often due to noise or chemical contaminants.<br />

b. Minimum number of peaks – confidence in the location and charge state<br />

assignment of a peptide feature may be greater if it is supported by<br />

more isotopic peaks. Setting the minimum number of peaks to 2 will also<br />

eliminate the stray peaks described above.<br />

c. Minimum number of scans – set the minimum number of scans that a<br />

peptide must span in order to be considered. This has the effect of eliminating<br />

peptide features that persist for only a brief time.<br />

d. Minimum intensity – setting a minimum intensity threshold is often appropriate,<br />

although the specific value used will depend on the instrument.<br />

e. Maximum KL score – peaks are grouped by how well they match a model<br />

of the isotopic distribution of a peptide with a given mass. The KL score<br />

described in Bellew, et al. (1) measures how much an extracted group of<br />

peaks deviates from this model; in general, a lower KL score indicates a<br />

better match.<br />

12. When multiple feature sets are loaded, it is often useful to hide particular sets<br />

or to change the colors of the crosses that mark features in a given set. Both<br />

of these can be accomplished in the Display Features dialog box as shown in<br />

Fig. 4A (select Tools > Display Peptides). For each feature set, this dialog box<br />

provides a checkbox to control visibility and a color palette to select colors for<br />

the crosses.<br />

13. After optimization, the mass and scan window values that give the best alignment<br />

results automatically populate the Peptide Array tab.<br />

14. A number of high-quality open source tools are available for microarray analysis.<br />

To analyze peptide arrays produced by msInspect, tools from the Bioconductor<br />

project (http://www.bioconductor.org) and from the TM4 microarray software<br />

suite (http://www.tm4.org) have been used.<br />

15. Results from isotopic labeling should be treated as suggestive rather than authoritative.<br />

Without peptide sequence information, the mass difference between<br />

heavy and light partners cannot be definitively ascertained. The quality of the


Open LC-MS Analysis Platform 381<br />

matching is therefore dependent on the quality of feature filtering and the density<br />

of features in each run.<br />

Acknowledgments<br />

The authors would like to thank Matthew Bellew, Marc Coram, Jimmy Eng,<br />

Ruihua Fang, Mark Igra, and Tim Randolph for their intellectual contributions<br />

to the development of msInspect. This work was supported by contract #<br />

23XS144A from the National Cancer Institute.<br />

References<br />

1. Bellew, M., Coram, M., Fitzgibbon, M., Igra, M., Randolph, T., Wang, P.,<br />

May, D., Eng, J., Fang, R., Lin, C.W., Chen, J., Goodlet, D., Whiteaker, J.,<br />

Paulovich, A., and McIntosh, M. (2006) A suite of algorithms for<br />

the comprehensive analysis of complex protein mixtures using highresolution<br />

LC-MS. Bioinformatics Advance Access published on June 9, 2006<br />

http://bioinformatics.oxfordjournals.org/cgi/reprint/btl276v1.<br />

2. Wang, P., Tang, H., Zhang, H., Whiteaker, J., Paulovich, A.G., and McIntosh,<br />

M. (2006) Normalization regarding non-random missing values in high-throughput<br />

mass spectrometry data. Proceedings of the Pacific Symposium on Biocomputing<br />

11, 315–326.<br />

3. May, D. Fitzgibbon, M., Liu, Y., Holzman, T., Eng, J., Kemp, C.J., Whiteaker, J.,<br />

Paulovich, A., and McIntosh, M. (2007) A Platform for Accurate Mass and<br />

Time Analyses of Mass Spectrometry Data. Journal of Proteome Research 6(7),<br />

2685–2694.<br />

4. Pedrioli, P.G., Eng, J.K., Hubley, R., Vogelzang, M., Deutsch, E.W., Raught, B.,<br />

Pratt, B., Nilsson, E., Angeletti, R.H., Apweiler, R., Cheung, K., Costello, C.E.,<br />

Hermjakob, H., Huang, S., Julian, R.K., Kapp, E., McComb, M.E., Oliver, S.G.,<br />

Omenn, G., Paton, N.W., Simpson, R., Smith, R., Taylor, C.F., Zhu, W., and<br />

Aebersold, R. (2004) A common open representation of mass spectrometry data and<br />

its application to proteomics research. Nature Biotechnology 22(11), 1459–1466.<br />

5. Computational Proteomics Laboratory. msInspect website. Accessed on June 28,<br />

2006 at http://proteomics.fhcrc.org/CPL/msinspect.html.


20<br />

Pattern Recognition Approaches for Classifying<br />

Proteomic Mass Spectra of Biofluids<br />

Ray L. Somorjai<br />

Summary<br />

The statistical classification strategy we have developed for magnetic resonance,<br />

infrared, and Raman spectra for the analysis of biomedical data is discussed, particularly<br />

as it applies to proteomic mass spectra. A general discussion of the current use of<br />

pattern recognition methods is given, with caveats and suggestions relevant for clinical<br />

applicability.<br />

Key Words: visualization; preprocessing; feature selection/extraction; robust<br />

classifier; classifier aggregation; proteomics; mass spectroscopy; magnetic resonance<br />

spectroscopy; biodiagnostics.<br />

1. Introduction<br />

Unlike magnetic resonance spectroscopy (MRS), infrared spectroscopy<br />

(IRS), and Raman spectroscopy (RS) (1,2,3), proteomic mass spectroscopy<br />

(PMS) is a relative newcomer to the field of biodiagnostics. However, with<br />

the goal of discriminating various disease and disease states, it is a welcome<br />

complementary technique that provides yet another means of analyzing<br />

biofluids. In particular, this complementarity extends the range of characterizing<br />

biofluids, from vibrational states of specific chemical groups (IRS, RS),<br />

through the identification of small molecules (MRS), to proteins and protein<br />

fragments (PMS).<br />

Being an emerging field, PMS suffers from growing-up pains. In particular,<br />

there are experimental difficulties specific to PMS that have yet to be addressed<br />

From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />

Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />

383


384 Somorjai<br />

(see Note 1) (in the following, the author assumes that the spectra, for which<br />

classifiers are to be developed, have been properly “processed”).<br />

Typically, biomedical data consist of a relatively few (of the order 10–100)<br />

samples (patterns) that are initially presented in a very high-dimensional feature<br />

space (feature ≡ m/z intensity), with dimensionality L (dimension ≡ features) of<br />

order 1000–10,000. Unfortunately, these two characteristics lead to two curses<br />

that impede the development of robust classifiers: the curse of dimensionality<br />

and the curse of dataset sparsity (3). The consequence of the two curses is<br />

that the sample to feature ratio (SFR) is 1/10–1/1000, instead of the minimal<br />

5–10, required for robust classification, as is generally accepted by the machine<br />

learning community.<br />

In this chapter, the author presents the specific strategy [dubbed statistical<br />

classification strategy (SCS)] they have developed over the last dozen years<br />

to deal with such problems, particularly as they apply to MR, IR, and Raman<br />

spectra. We have been adapting this strategy and applying it with success to<br />

biomedical data derived from both proteomics mass spectra and microarrays<br />

(see Note 2). The author compares the differences and similarities of the SCS<br />

with the proteomics data analysts’ current tools and wherever possible, makes<br />

recommendations.<br />

2. The Statistical Classification Strategy<br />

Lifting the twin curses of high dimensionality and dataset sparsity requires<br />

special approaches. The “strategy” part of the SCS reflects the fact that no<br />

single approach is, or can be optimal [“there are no panaceas in data analysis”<br />

(4)], and that a data-driven, multistage strategy is necessary or even essential.<br />

Using a divide-and-conquer philosophy, the SCS consists of five stages:<br />

1. Data visualization<br />

2. Preprocessing<br />

3. Feature selection/extraction<br />

4. Robust classifier development<br />

5. Classifier aggregation (ensembles)<br />

The five stages are, of course, intimately interrelated; in particular, we use<br />

the visualization stage to constantly monitor how well the other stages of the<br />

strategy are working. Figure 1 provides a flowchart of the SCS. A more detailed<br />

description of the SCS can be found in (5) (see Note 3).<br />

2.1. Visualization of High-Dimensional Data<br />

Proper data visualization is an essential first step that requires dimensionalityreducing<br />

mapping/projection from typically a very large, L-dimensional feature


Pattern Recognition for Proteomic Spectra 385<br />

DATA VISUALIZATION<br />

PREPROCESSING<br />

FEATURE SELECTION / EXTRACTION<br />

CLASSIFIER DEVELOPMENT<br />

CLASSIFIER AGGREGATION<br />

Fig. 1. Flowchart for the five stages of the SCS.<br />

space to one to three dimensions. Of course, mapping from high dimensions to<br />

lower ones cannot preserve all distances exactly, because most of the original<br />

degrees of freedom are lost. However, if only class separability is required,<br />

exact visualization, our primary goal, is both achievable and sufficient. In<br />

fact, we recently proposed such an approach (6). It involves mapping highdimensional<br />

patterns to a special plane, the relative distance plane (RDP). The<br />

mapping procedure starts with the selection of a distance measure. This can<br />

range from Euclidean, city block, maximum norm to Mahalanobis, and its<br />

generalization (Anderson – Bahadur, AB) (7). Next, two reference patterns<br />

are chosen, one from each class. The critical observation, on which the RDP<br />

mapping relies, is that the distance of any other pattern to these two reference<br />

points is preserved exactly even after the mapping. This is because a triangle<br />

remains a triangle in any dimension and for any distance metric. Hence, the<br />

three distances of any such a triangle can be displayed in two dimensions,<br />

without distortion. By cycling through all possible reference pairs, we can<br />

display and visualize the data with respect to these sets, i.e., from a large number<br />

of possible “perspectives” (as an analogy, consider looking at a sculpture from<br />

every angle to assess its shape and form), a very powerful approach for detecting<br />

outliers (e.g., poor quality spectra), discovering additional subgroups within a<br />

class (clustering), assessing whether training and test sets derive from the same<br />

distributions, etc., in short, for establishing and ensuring quality control.<br />

2.2. Preprocessing<br />

Preprocessing enables the user to adapt, “tune” the data, so that the subsequent<br />

stages of the SCS are optimized. For spectra, whether MS or MR,<br />

we found that the most useful preprocessing approaches, alone or in combination,<br />

are normalization (“whitening,” or scaling to unit area), smoothing<br />

(filtering), and/or peak alignment (with respect to some internal or external


386 Somorjai<br />

reference). Various transformations of the spectra lead frequently to better<br />

classification. Examples of such transformations include replacing the spectra<br />

by their (numerical) derivatives or by rank-ordered variants (the nonlinear<br />

rank-ordering replaces the original features by their ranks, thus minimizing<br />

the influence of accidentally large or small feature values) and combinations<br />

of these. Furthermore, creating differently preprocessed versions of the same<br />

dataset, selecting different sets of features from these (stage 3), and developing<br />

different classifiers using these feature sets (stage 4) facilitates the aggregation<br />

of these multiple classifiers for possibly increased accuracy (stage 5). The<br />

achieved classifier’s accuracy and reliability are also assessed by visualization<br />

of the results (stage 1). This demonstrates how the strategy uses the stages in<br />

an interactive, feedback fashion.<br />

2.3. Feature Selection/Extraction<br />

In general, this stage is one of the two most important components of the<br />

SCS. It is essential not only for dimensionality reduction (which helps lifting<br />

the curse of dimensionality), but, when done properly, also helping to arrive at<br />

biologically relevant and transparent interpretations of the data (“biomarker”<br />

identification). The driving force behind feature selection/extraction (FSE) is<br />

the goal of satisfying one of the two critical requirements for any reliable<br />

classifier development, lifting the curse of dimensionality.<br />

Spectra, whether mass or MR, are peculiar: their “intrinsic dimensionality,”<br />

the number of independent, relevant features they possess, is generally much<br />

smaller than their original dimensionality. This is because spectra have many<br />

irrelevant features (“noise”), and adjacent features are strongly correlated.<br />

Some of these correlated features correspond to spectral peaks, representing<br />

small molecules (MRS), or small proteins, protein fragments, or peptides<br />

(PMS). Thus, it is clearly beneficial to eliminate irrelevant features and<br />

identify discriminatory peaks (potential “biomarkers”). For spectra, principal<br />

component analysis, a frequently used dimension reduction method (often the<br />

principal tool of many PMS data analysts), is doubly dangerous. First, it<br />

“scrambles” the original features, making discriminatory feature identification<br />

and selection problematic; second, since the principal components (PCs) are<br />

ordered according to the maximum variance explained in the data, there is no<br />

guarantee that the first few PCs are discriminatory for classification. Even if<br />

one were to choose the first M ≪ L PCs from the original, total L-term set, these<br />

are rarely the best discriminators. One could try selecting m < M PCs as optimal<br />

for classification (e.g., by exhaustive search); our early experience indicates<br />

that some of the good discriminators are among the remaining k = M + 1,…,L


Pattern Recognition for Proteomic Spectra 387<br />

subset of PCs. All these difficulties point to the need for a feature selection<br />

method specific to spectral data, one that preserves spectral interpretability.<br />

There are two generic approaches to feature selection (8). The filter method<br />

selects features without consideration of the classifiers to be used with these<br />

features. The wrapper (embedding) method finds optimal features, while using<br />

the eventual classifier to guide the selection method. We have developed a<br />

genetic algorithm-based optimal region selection (GA-ORS) method that finds<br />

discriminatory features without loosing spectral interpretability (9).<br />

The GA-ORS is based on the wrapper approach and is an example of feature<br />

extraction. It has the advantage that the spectral ranges found are averaged<br />

over adjacent data points (thus equivalent to peak area determination). Such<br />

averaging increases the signal to noise ratio, a bonus. Within the GA-ORS suite<br />

of programs, one can also control the widths of the selected spectral subregions<br />

(discriminatory peaks); this helps to eliminate those regions that appear to be<br />

discriminatory simply because of accidental differences in the “noise” regions<br />

due to the limited sample size (9,10).<br />

The GA-ORS has been very successful in identifying discriminatory subregions<br />

of MR, IR, and Raman spectra of biofluids and tissues, obtained for<br />

distinguishing between various diseases and disease states (1).<br />

In the context of feature selection, many proteomic mass spectroscopists first<br />

identify “relevant” peaks, sometimes in an ad hoc fashion, as possible contributors<br />

to discrimination. Although using all available “domain knowledge” is very<br />

important and should always be considered when available, it can also introduce<br />

bias, because of possible preconceived notions of what is relevant for discrimination.<br />

Our feature selection approach, sketched above, removes most of such<br />

bias, by identifying hitherto unsuspected, novel discriminatory “peaks,” or more<br />

accurately, discriminatory spectral subregions. Furthermore, by its explicit multivariate<br />

nature, GA-ORS tends to identify a “fingerprint,” a “panel” of peaks whose<br />

simultaneous interaction is necessary for discrimination.<br />

When the multidimensional feature space does not arise from spectra, e.g.,<br />

microarray data or preselected discrete peaks in PMS, for which averaging<br />

adjacent features is not meaningful, direct application of the GA-ORS methodology<br />

may not be appropriate [although we have used it as a preliminary,<br />

clustering-type feature selection “trick” (5)]. However, when possible,<br />

exhaustive, or when not, a dynamic programming-based search for optimal or<br />

near-optimal discriminatory feature subsets is still feasible and is one of the<br />

options available in GA-ORS.<br />

Figure 2 demonstrates the importance of feature selection, and the relevance<br />

of an interactive, feedback-mode visualization of data. For the two-class,<br />

prostrate cancer vs. healthy proteomic (mass spectral) dataset (11), we display<br />

a Euclidean distance-based mapping, either directly from the original 15,154


388 Somorjai<br />

Prostate Cancer – L 2 Mapping from<br />

15,154 Dimensions 5 Dimensions<br />

Fig. 2. Mapping from the original 15,154 dimensions (left panel) misclassified eight<br />

samples from the training set (TS; class 1, black disks, class 2, black crosses) and nine<br />

from the independent validation (test) set (VS; class 1, grey triangles, class 2, grey<br />

squares). The mapping from five dimensions (right panel), classified correctly all TS<br />

and the VS samples. The dashed lines shown are the optimal LDA separators.<br />

dimensions (left panel) or from five dimensions, reduced via GA-ORS (right<br />

panel). Clearly, the success of class separation depends on the dimensionality<br />

of the feature space. When mapping from the original 15,154 dimensions,<br />

the optimal two-dimensional separation of training sets (TS; black disks for<br />

class 1, black crosses for class 2) and test sets (VS; grey triangles for class 1,<br />

grey squares for class 2) misclassify eight samples from the training set and<br />

nine from the independent test set. For the mapping from five dimensions, all<br />

samples are classified correctly (see Note 4).<br />

2.4. Robust Classifier Development<br />

There are two, generally interrelated goals for supervised classifiers. First,<br />

we want robust classifiers, i.e., with high generalization power. This is realized<br />

when the classifier classifies new, unknown “patterns” correctly and reliably.<br />

Second, we want to identify the smallest subset of maximally discriminatory<br />

features. Eventual disease management/treatment would benefit from having<br />

only a few, biologically relevant and interpretable features. Ideally, both classification<br />

goals should be achieved, especially in clinically relevant studies.<br />

Unfortunately, achieving the first goal is frequently at the expense of the<br />

second. A good example is the recent use of support vector machines (SVMs)<br />

for classification. These have become particularly popular because of their


Pattern Recognition for Proteomic Spectra 389<br />

persuasive theoretical foundations (12,13) (see Note 5). However, because the<br />

SVMs project the data into even higher dimensional feature spaces to achieve<br />

linear separability of the classes, relevant, discriminatory feature identification<br />

becomes more difficult.<br />

The technical complexity and sophistication of the classifiers used range<br />

from the simplest correlation techniques, through k nearest neighbors, linear and<br />

quadratic discriminant analysis, decision trees, neural nets, etc., to (nonlinear)<br />

SVMs. However, the choice of classifier seems not to be dictated by the data<br />

to be classified, but rather by “expert” recommendation (usually based on other<br />

types of data), personal experience or preference, or simply software availability.<br />

The maxim “simpler is better” has mostly been ignored [see however<br />

(14)]. In general, no specific effort has been expended on choosing the most<br />

appropriate, optimal type of classifier for a given dataset. With a few exceptions,<br />

the proteomics (mass spectroscopy) community tends to use the “best”<br />

(i.e., the most sophisticated) classifier, whether appropriate or not!<br />

If the dataset size is sufficiently large, then the optimum approach for developing<br />

a robust classifier is to partition the data into training set, monitoring<br />

set and a completely independent test (validation) set. Such partitioning is<br />

required to prevent overfitting. This occurs when the classifier adapts itself too<br />

closely to the peculiarities of a training set that comprises a limited number<br />

of samples. Using a monitoring set helps decide when to stop training. The<br />

ultimate assessment of the classifier’s generalization capability is how well it<br />

does on the independent test set that was in no way involved in creating the<br />

classifier.<br />

Unfortunately, a sufficiently large sample size is a luxury rarely available to<br />

the data analysts of biomedical data. The only recourse is to use some version<br />

of crossvalidation (CV) (15). CV comes in different flavors, each with its<br />

advantages and disadvantages. All of them are designed to deal with the bias<br />

introduced by using the entire dataset both to develop the “optimal” classifier<br />

and to estimate the classification error (see Note 6).<br />

It is important to re-emphasize that because of the typical small sample size<br />

of biomedical data, the best approach to robust classifier development is to<br />

select the simplest classifier possible. This suggests linear classifiers. Complex<br />

classifiers have too many parameters that need optimization, inevitably raising<br />

the scepter of overfitting (see Note 7). Dimensionality reduction (FSE) is, of<br />

course, essential for obtaining an appropriate SFR. Realizing the role of the<br />

SFR is important when developing classifiers. However, an essential caveat is<br />

that data sparsity can render any classification result statistically suspect, even<br />

if the SFR is satisfied (3). The importance of guaranteeing the appropriate SFR<br />

is being recognized. However, the consequences of data set sparsity are still<br />

not appreciated (16).


390 Somorjai<br />

The control of disparate sensitivities and specificities produced by classifiers<br />

when the dataset is imbalanced has particular clinical relevance (typically, there<br />

are many more samples from normal subjects than from patients with particular<br />

diseases) and tuning methods are needed for the classifiers developed. The<br />

standard method in the pattern recognition literature is either oversampling<br />

(taking multiple samples from the sparser class), or undersampling (taking a<br />

subset of the samples from the larger class), such that the sample sizes in the<br />

two classes become balanced (sensitivity, SE ≈ specificity, SP). However, this<br />

approach fails quite frequently. Our approach is based on penalizing misclassification<br />

of members of the smaller class until SE ≈ SP (note that the penalty<br />

weight is generally not equal to the ratio of the class sizes).<br />

2.5. Classifier Aggregation<br />

Clinically relevant classifiers require statistically significant class assignments<br />

for the samples. Thus, when a classifier’s assignment probability for<br />

a sample is “fuzzy” (e.g., less than 75% for a second class problem) that<br />

assignment is not really useful from a clinical point of view. If the overall<br />

accuracy of a classifier is low and the assignments are fuzzy, a multiple classifier<br />

strategy (classifier aggregation) can frequently be beneficial. The idea is to<br />

combine the outputs of several classifiers, with the expectation that the new<br />

classifier thus formed will be more accurate and less fuzzy than the best of the<br />

individual constituents.<br />

One of the requirements for accurate ensemble-based classifiers is diversity.<br />

It is believed that the component classifiers should be as different as possible.<br />

This can be achieved in several ways. One of these approaches used conceptually<br />

and methodologically very different classifiers (Linear Discriminant<br />

Analysis (LDA), neural nets, and dynamic programming) on the same, unmodified<br />

data (17). However, our more recent experiments and experiences suggest<br />

that classifier diversity is not necessarily required. Comparable accuracy can<br />

be achieved in a simpler way, by employing a single, simple classifier (e.g.,<br />

LDA) and producing diversity using different transformations of the data (we<br />

have already discussed some of these in the context of feature selection).<br />

How are we to combine the outcomes of the various classifiers Some<br />

of the combinations range from the simple majority rule to more complex,<br />

trainable rules, e.g., stacked generalization (SG) (18). SG uses the output<br />

probabilities of the constituent classifiers as input features for a new classifier.<br />

Boosting (19) is a very powerful version a learnable classifier combination<br />

rule (see Note 8). It was used for identifying proteomic biomarkers for cancer<br />

detection (20). There are many classifier combination rules. When choosing<br />

such a rule, it is important to take into account both sample size and classifier<br />

complexity.


Pattern Recognition for Proteomic Spectra 391<br />

3. Discussion<br />

Of course, experimental quality control is essential for good classifiers, i.e.,<br />

those that have useful generalization properties. Much has been made of the<br />

“surprising” observation that different (or even the same) experimental groups,<br />

using different classifiers end up with totally different sets of discriminatory<br />

features (21). These are ascribed to various possible experimental differences in<br />

the spectral acquisition, etc. (22,23,24). Although these are indeed significant<br />

contributing factors, and must be considered and corrected, sight is lost of the<br />

important fact that when nonunique discriminatory sets are found, they are as<br />

likely caused by dataset sparsity (3) as by differences in experimental protocols.<br />

The initial euphoria is over: one cannot (or should not be able to) publish<br />

in prestigious journals (e.g., Science, Nature, Lancet, PNAS, etc.) proteomic<br />

results based on very limited sample sizes. Furthermore, even when there<br />

are enough data to produce a respectable classifier, high-impact journals are<br />

unlikely to accept a manuscript unless the results are independently validated. In<br />

particular, the chemical/biological identification of the discriminatory proteins,<br />

protein fragments, or peptides must accompany the classification results. This<br />

increased focus on establishing the clinical relevance of putative biomarkers<br />

is definitely a good sign. However, at this stage of the game, it is possibly<br />

premature, and one would prefer first to have a quick, noninvasive, reliable<br />

diagnostic/prognostic tool. To be clinically relevant, many more samples are<br />

required to develop such a tool (i.e., a sufficiently robust classifier; this<br />

requirement will likely rule out the reliable detection of rare diseases). Unfortunately,<br />

currently available sample sizes preclude the discovery of unique<br />

biomarker “fingerprints” of a disease. This nonuniqueness due to data sparsity<br />

leads inevitably to expensive, onerous, and unnecessary laboratory investigations<br />

to sift out medically relevant, unique subsets from the plethora of<br />

putative biomarkers found and suggested for various diseases. Understanding<br />

the biochemical causes is, of course, essential for, say, finding a possible cure,<br />

but should succeed the diagnostic/prognostic stage. Despites such caveats, the<br />

proteomics field is maturing and once the technical problems are successfully<br />

resolved, will undoubtedly provide important medical/clinical insights.<br />

The author further suggests that the power of proteomic spectroscopy can be<br />

enhanced by the simultaneous consideration of other experimental modalities<br />

that complement PMS, especially MRS, which could identify smaller discriminatory<br />

compounds also present in biofluids.<br />

4. Notes<br />

1. Amongst these are correcting the nonflat baselines arising from the matrix<br />

material, peak alignment of the spectra, reconciling data acquisition at different<br />

times, in different laboratories, with mass spectrometers of different sensitivity,


392 Somorjai<br />

correcting high frequency noise, etc. Proper experimental design, including<br />

rigorous quality assessment and control is essential before any classifier development<br />

is attempted. Good discussions and summaries are given in (21,22,23,24).<br />

2. The realization that some classification strategy is essential for the analysis of<br />

proteomic data is recent. That these strategies are different emphasizes that not<br />

only there is no best classifier, but also that no unique, best strategy exits either;<br />

different groups discovered different strategies that worked well for the data they<br />

analyzed (20,25). What common is that all strategies are multistage.<br />

3. The data-driven nature of the SCS emphasizes the fact that there is no simple,<br />

universal prescription for creating an optimal classifier (4), i.e., no simple, ready<br />

“recipe” is or likely to be available.<br />

4. This much-improved result strengthens the importance of feature selection. Note<br />

that both mappings were done using the Euclidean distance, necessary, because<br />

one cannot use any other distance measure (e.g., Mahalanobis) that involves<br />

matrix inversion. After feature selection, when the number of features is fewer<br />

than the number of samples, much more powerful and relevant distance measures<br />

can be used. For a fair comparison, the Euclidean distance is used for both cases<br />

presented in Fig. 2 [for further possible improvements obtainable using other<br />

distance measures see (6)]<br />

5. In practice, SVMs are not nearly as effective as suggested by theory. In fact,<br />

we have found (26) that a simple LDA classifier, with wrapper-driven feature<br />

selection, when applied to several publicly available proteomic mass spectra, and<br />

to six microarray datasets, generally outperformed a linear SVM, even when<br />

the latter was used with feature selection. Furthermore, SVM-based classifiers<br />

frequently produce classification results that are distinctly out of balance. The<br />

accuracy obtained for one of the classes is most of the time considerably better.<br />

This imbalance between sensitivity and specificity is of clinical relevance when<br />

trying to minimize false negatives and/or false positives.<br />

6. Different variants of CV deal differently with the so-called bias-variance dilemma,<br />

particularly acute for datasets with limited sample size. The simplest version, the<br />

leave-one-out (LOO) method, removes one of the N samples, develops a classifier<br />

with the remaining N – 1 samples, and tests its prediction accuracy on the left-out<br />

sample. By cycling through all N samples, N accuracy assessments are found. For<br />

small N (for which the data partition, as described in the main text, is not possible),<br />

LOO suffers from large variance, even though it minimized the bias. K-fold CV is<br />

frequently used to balance bias and variance. The samples are partitioned into K<br />

roughly equal subsets. K – 1 subsets are used for training the classifier, while the leftout<br />

subset is the current test set. Cycling through the K partitions and then calculating<br />

the mean and standard deviation of the accuracies over the K test sets assess how well<br />

and how reliably one is expected to classify new, unknown samples. K is typically<br />

chosen to be 5 or 10, whether or not the sample size warrants this choice. A more<br />

reasonable approach is to determine the best K via CV. Particularly, powerful is<br />

Efron’s bootstrapping approach (15). This involves the entire dataset, but uses a<br />

random resampling with replacement strategy. A large number of artificial datasets


Pattern Recognition for Proteomic Spectra 393<br />

of the same size as the original are thus produced. A classifier is created for each<br />

of these, and the outcomes are averaged. Bootstrapping is supposed to reduce both<br />

large bias and variance. Inspired by the bootstrapping concept, we have been using,<br />

with some success, its generalization (27).<br />

7. Instead of the direct use of nonlinear classifiers, with the attendant optimization<br />

problems, a simple trick is to use nonlinear terms but retain the simplicity of a<br />

linear classifier. One approach we found useful is to first develop a linear classifier<br />

(with feature selection) and then augment the linear features by constructing from<br />

them nonlinear functions, say, quadratic terms. This, of course, increases the<br />

number of parameters to be determined. However, the problem remains linear in<br />

the augmented feature space and linear classifiers can be developed. Furthermore,<br />

our explicit approach produces new features that remain interpretable as interaction<br />

terms. This is unlike the SVM classifiers that map implicitly into a much<br />

higher dimensional linear feature space, without interpretability. In addition, we<br />

can reduce the dimensionality of our augmented feature space by additional feature<br />

selection via exhaustive search, optimized by CV.<br />

8. Boosting requires “weak” base classifiers, C j , j = 1,2,…,j that are combined into<br />

a more accurate composite classifier, D j = C 1 + C 2 +…=C j . At stage m, the<br />

boosting algorithm carries out a weighed selection of a base classifier, given all<br />

previously chosen base classifiers. For the new base classifier C m , larger weights<br />

are given to samples that are incorrectly classified by the current composite<br />

classifier D m−1 so that C m will be chosen with a tendency to correctly classify<br />

previously incorrectly classified samples.<br />

Acknowledgments<br />

The author thanks the entire Biomedical Informatics Group for their decadelong,<br />

essential contributions to the development of the algorithms and softwares<br />

described.<br />

References<br />

1. Lean, C. L., Somorjai, R. L., Smith, I. C. P., Russell, P., Mountford, C. E.<br />

(2002) Accurate diagnosis and prognosis of human cancers by proton MRS and<br />

a three stage classification strategy. Annual Reports on NMR Spectroscopy 48,<br />

71–111.<br />

2. Somorjai, R. L., Dolenko, B., Nikulin, A., Nickerson, P., Rush, D., Shaw, A. et al.<br />

(2002) Distinguishing normal from rejecting renal allografts: application of a threestage<br />

classification strategy MR and IR spectra of urine. Vibrational Spectroscopy<br />

28, 97–102.<br />

3. Somorjai, R. L., Dolenko, B., Baumgartner, R. (2003) Class prediction and<br />

discovery using gene microarray and proteomics mass spectroscopy data: curses,<br />

caveats, cautions. Bioinformatics 19, 1484–1491.<br />

4. Huber, P. J. (1985) Projection pursuit. Ann. Statistics 13, 435–475.


394 Somorjai<br />

5. Somorjai, R. L., Alexander, M., Baumgartner, R., Booth, S., Bowman, C., Demko,<br />

A., Dolenko, B., Mandelzweig, M., Nikulin, A. E., Pizzi, N., Pranckeviciene,<br />

E., Summers, R., Zhilkin, P. (2004) A data-driven, flexible machine learning<br />

strategy for the classification of biomedical data. In: Dubitzky, W. and Azuaje, F.<br />

(eds.) Artificial Intelligence Methods and Tools for Systems Biology, Chapter 5.<br />

Computational Biology Series, Vol. 5. Springer, pp. 67–85.<br />

6. Somorjai, R. L., Demko, A., Mandelzweig, M., Dolenko, B., Nikulin, A. E.,<br />

Baumgartner, R. et al. (2004) Mapping high-dimensional data onto a relative<br />

distance plane – a novel, exact method for visualizing and characterizing highdimensional<br />

patterns. Journal of Biomedical Informatics 37, 366–379.<br />

7. Anderson, T. W., Bahadur, R. R. (1962) Classification into two multivariate normal<br />

distributions with different covariance matrices. Annals of Mathematical Statistics<br />

33, 420–431.<br />

8. Kohavi, R., John, G. H. (1997) Wrappers for feature subset selection. Artificial<br />

Intelligence 273–324.<br />

9. Nikulin, A. E., Dolenko, B., Bezabeh, T., Somorjai, R. L. (1998) Near-optimal<br />

region selection for feature space reduction: novel preprocessing methods for<br />

classifying MR spectra. NMR in Biomedicine 11, 209–217.<br />

10. Li, J., Zhang, Zh., Rosenzweig, J., Wang, Y. Y., Chan, D. W. (2002) Proteomics<br />

and bioinformatics approaches for identification of serum biomarkers to detect<br />

breast cancer. Clinical Chemistry 48, 1296–1304.<br />

11. Dataset “JNCI-7-3-02,” downloaded from the NIH/FDA Clinical Proteomics<br />

Program Databank (http://clinicalproteomics.steem.com).<br />

12. Vapnik, V. N. (2000) The nature of statistical learning theory, 2nd edition, Statistics<br />

for Engineering and Information Science. Springer, New York.<br />

13. Schölkopf, B., Smola, A. J. (2002) Learning with Kernels. Support Vector<br />

Machines, Regularization, and Beyond. The MIT Press, Cambridge, Mass.<br />

14. Lee, K. R., Lin, X., Park, D. C., Eslava, S. (2003) Megavariate data analysis<br />

of mass spectrometric proteomics data using latent variable projection method.<br />

Proteomics 3, 1680–1686.<br />

15. Efron, B. (1982) The Jackknife, the Bootstrap and Other Resampling Plans. SIAM,<br />

Philadelphia.<br />

16. Diamandis, E. P. (2003) Proteomic patterns in biological fluids: do they represent<br />

the future of cancer diagnostics Clinical Chemistry 49(8), 1272–1278.<br />

17. Somorjai, R. L., Nikulin, A. E., Pizzi, N., Jackson, D., Scarth, G., Dolenko, B.,<br />

Gordon, H., Russel, P., Lean, C. L., Delbridge, L., Mountford, C. E., Smith, I.<br />

C. P. (1995) Computerized consensus diagnosis: a classification strategy for the<br />

robust analysis of MR spectra. I. Application to 1 H spectra of thyroid neoplasms.<br />

Magnetic Resonance in Medicine 33, 257–263.<br />

18. Wolpert, D. H. (1992) Stacked generalization. Neural Networks 5, 241–259.<br />

19. Schapire, R. R. (1990) The strength of weak learnability. Machine Learning 5,<br />

197–227.<br />

20. Yasui, Y., Pepe, M., Thomson, M. L., Adam, B.-L., Wright Jr., G. L., Qu, Y.,<br />

Potter, J. D., Winget, M., Thornquist, M., Feng, Z. (2003) A data-analytic strategy


Pattern Recognition for Proteomic Spectra 395<br />

for protein biomarker discovery: profiling of high-dimensional data for cancer<br />

detection. Biostatistics 3, 449–463.<br />

21. Diamandis, E. P. (2004) Mass spectrometry as a diagnostic and a cancer biomarker<br />

discovery tool. Molecular and Cellular Proteomics 3(4), 367–378.<br />

22. Baggerly, K. A., Morris, J. S., Coombes, K. (2004) Cautions about reproducibility<br />

in mass spectrometry patterns: joint analysis of several proteomic data sets. Bioinformatics<br />

20, 777–785.<br />

23. Hu, J., Coombes, K. R., Morris, J. S., Baggerly, K. A. (2005) The importance<br />

of experimental design in mass spectrometry experiments: some cautionary tales.<br />

Briefings in Functional Genomics and Proteomics 3(4), 322–331.<br />

24. Shin, H. and Markey, M. K. (2006) A machine learning perspective on the development<br />

of clinical decision support systems utilizing mass spectra of blood samples.<br />

Journal of Biomedical Informatics 39, 2237–2248.<br />

25. Zhu, W., Wang, X., Ma, Y., Rao, M., Glimm, J., Kovach, J. S. (2003) Detection of<br />

cancer-specific markers amid massive mass spectral data. Proceedings of National<br />

Academic Science USA 100(25), 14666–14671.<br />

26. Somorjai, R. L. and Pranckeviciene, E. (2006) (Unpublished).<br />

27. Somorjai, R. L., Dolenko, B., Nikulin, A., Nickerson, P., Rush, D., Shaw, A., De<br />

Glogowski, M., Rendell, J., Deslauriers, R. (2002) Distinguishing normal from<br />

rejecting renal allografts: application of a three-stage classification strategy to MR<br />

and IR spectra of urine. Vibrational Spectroscopy 28, 97–102.


Index<br />

Affi-gel Protein A MAPS II kit, 277<br />

Aflatoxin B1 (AFB1), 194<br />

Alkaline phosphatase (ALP) assay, 233, 237<br />

Alpha-fetoprotein, 194<br />

Alzheimer’s disease, 310<br />

Annexin V, 172<br />

ANOVA, analysis of variance, 100, 112, 114, 259,<br />

330, 335, 344<br />

Antibody arrays<br />

construction, 270–272<br />

direct labeling methods, for cancer diagnostics,<br />

268–269<br />

formats for, 264–266<br />

labeling and hybridization, of serum samples,<br />

269–270, 272–274<br />

and other proteomic strategies, 263–264<br />

planar, labeling-hybridization methods and,<br />

266–268<br />

printing, 269<br />

scanning and data analysis, 274<br />

Anti-SAPE antibody, 267<br />

ArrayQuant scanners, 281<br />

AutoPix TM , 48. See also Laser-capture<br />

microdissection<br />

Axon scanners, 281<br />

Bayesian classification methods. See Linear<br />

Discriminant Analysis<br />

Bayes’s rule, 300<br />

BCA 200 Protein Assay Kit, 277<br />

Bead-based multiplex assays. See also Suspension<br />

antibody microarrays<br />

detection antibody, 254<br />

diluents, 254<br />

general protocol for, 254–255<br />

sample preparation, 252–254<br />

screening protocol, 255–256<br />

Biological variation analysis (BVA) module, of<br />

DeCyder, 112–113<br />

“Biomarker panel,” 11<br />

Bio-Rad Micro Bio-Spin P30 column, 277<br />

Biotinyl-tyramide, 275<br />

397<br />

BLAST, 352, 358<br />

Blood samples, preanalytical phase<br />

collection of, 36<br />

processing of, 37–38<br />

protease inhibitors, 38<br />

serum and plasma specimens, characteristics of,<br />

36–37<br />

Bradford assay, 225<br />

Carboxylated beads, 249. See also Suspension<br />

antibody microarrays<br />

activation, 251<br />

antibodies coupling to activated, 251<br />

cell-counting chamber and, 252<br />

washing and storage of coupled, 251<br />

1-(5-Carboxypentyl)-1-methylindodi-carbocyanine<br />

halide (Cy5) N-hydroxy-succinimidyl<br />

ester, 163<br />

1-(5-Carboxypentyl)-1-propylindocarbocyanine<br />

halide (Cy3) N-hydroxy-succinimidyl<br />

ester, 163<br />

CAST. See Clustering Affinity Search Technique<br />

Celecoxib, and cyclooxygenase-2 (COX-2), 183<br />

Charge-couple device (CCD) camera-based<br />

imaging system, 268, 293, 332<br />

CIMminer (Clustered Image Maps), 259<br />

Cleavable isotope-coded affinity tag (cICAT)<br />

labeling technology, 195, 197, 200–201<br />

Clinical proteomics, 1<br />

biological specimens, 6–7<br />

biomarker discovery and, 9–14<br />

overview and scope of, 2–3<br />

sample specimens and processing techniques, 4–9<br />

Cluster analysis techniques, 297–299, 306<br />

gene expression-based, 307<br />

Clustering Affinity Search Technique, 259<br />

Coomassie brilliant blue (CBB) staining, 68,<br />

332, 339<br />

Creatinine assay, 142<br />

Cyanines (Cy3/Cy5), 264, 333<br />

Cyclooxygenase-2 (COX-2) and celecoxib, 183


398 Index<br />

CyDye labeling, 95, 105–106, 109–110. See also<br />

Difference gel electrophoresis (DIGE)<br />

technology<br />

Cy2-labeled internal standard, 98–99<br />

minimal labeling method, 96<br />

pooled-sample internal standard for, 107<br />

saturation labeling, 96<br />

Cy3-labeled streptavidin, 267<br />

Cytokeratin 19 (CK19), 163<br />

DA-PLS method. See Discriminant analysis–partial<br />

least squares method<br />

DeCyder software, 101, 112–113, 342. See also<br />

Difference gel electrophoresis (DIGE)<br />

technology<br />

Delayed extraction-matrix assisted laser<br />

desorption/ionization time-of-flight mass<br />

spectrometry (DE-MALDI-TOF-MS), 194<br />

Dendrogram, 297, 299<br />

Dialysis, 150. See also Urine protein profiling, by<br />

2DE and MALDI-TOF-MS<br />

Difference gel electrophoresis (DIGE) technology,<br />

78, 93, 330, 332–333, 342–345<br />

ANOVA, 100, 112, 114<br />

in clinical setting, 103<br />

CyDye labeling, 95, 105–106, 109–110<br />

Cy2-labeled internal standard, 98–99<br />

minimal labeling method, 96<br />

pooled-sample internal standard for, 107<br />

saturation labeling, 96<br />

DeCyder suite of software tools, 101, 112–113<br />

2D gel electrophoresis and poststaining, 94,<br />

110–111<br />

experimental design, 108–109<br />

and statistical confidence, 112–114<br />

extended data analysis (EDA) software module,<br />

101, 113<br />

false discovery rate (FDR), 100<br />

hierarchical clustering (HC), 102<br />

labeling materials, 104–105<br />

LCM and, 163–170<br />

MeOH/CHCl 3 protocol, 106<br />

MuDPIT, 97<br />

multivariate statistical analysis, 114–115<br />

principle component analysis, 101<br />

SDS-polyacrylamide gel electrophoresis, 104<br />

software algorithms, 111–112<br />

Student’s t-test, 100, 112, 114<br />

DIGE/MS analysis, 103, 115<br />

Direct labeling, 264, 268<br />

protocol for, 272–274<br />

Discriminant analysis–partial least squares method,<br />

306, 309–311<br />

Discrimination power (DP), 303–305<br />

Dithiothreitol (DTT), 68<br />

Dot-plot style alignment, of protein sequence,<br />

358–359<br />

DTT/IAA equilibration procedure, 73<br />

ECM. See Extracellular matrix<br />

EDA software. See Extended data analysis software<br />

EDC/Sulfo-NHS, 249. See also Suspension<br />

antibody microarrays<br />

2DE-MALDI-TOF-MS assay, 194<br />

EnsEmbl, 352, 356<br />

Escherichia coli, 307<br />

Ethylene vinyl acetate (EVA) polymer, 161<br />

Ettan 2D electrophoresis system, 110<br />

Exosomes, 142<br />

ExPASy proteomics tools, 202, 352<br />

Expressed sequence tags (ESTs), 357<br />

Extended data analysis software, 101, 113<br />

Extracellular matrix, 8<br />

and matrix vesicles (MVs) proteomes, MS and,<br />

231–232<br />

alkaline phosphatase assay, 234, 237<br />

immunofluorescence staining and, 235, 239<br />

MC3T3-E1, osteoblast cell line, 233,<br />

236–237, 239<br />

nanoRPLC-MS/MS, 235, 238–239<br />

strong cation exchange liquid chromatography,<br />

of peptides, 234–235, 238<br />

Extracted ion chromatogram, 219, 221–222, 224<br />

Fetal bovine serum (FBS), 254<br />

Fisher’s F-test, 302<br />

Flow cytometric analysis, 160<br />

Fluorophores, 264, 267<br />

photobleaching and quenching of, 274–275<br />

Fourier transformer mass spectrometry (FTMS),<br />

172–174<br />

Free flow electrophoresis (FFE), plasma samples<br />

fractionation and, 60–61, 67<br />

Frontotemporal dementia, 310<br />

GAORS method. See Genetic algorithm-based<br />

optimal region selection method<br />

2D Gaussian function, 312<br />

Gaussian multivariate probability distribution, 300<br />

2-D Gel-electrophoresis (2-D GE), 292. See also<br />

2D-PAGE maps analysis<br />

LCM cells analysis by, 77<br />

HER-2/neu positive and -negative breast<br />

tumors, 87–88


Index 399<br />

isoelectric focusing (IEF), 79–80, 83–84<br />

MASCOT search engine, 87<br />

paraffin-embedded sections staining, 81–82<br />

preparation and analysis, 61, 67–69<br />

protein sample preparation, 79, 82–83<br />

SDS-PAGE, 79–80, 84–85<br />

silver staining and image analysis, 80, 85–86<br />

tissue block and tissue section preparation,<br />

78–79, 81<br />

trypsin digestion and MS analysis, 80, 86–87<br />

Gel-free mass spectrometry and LCM, 171–172<br />

Gene expression microarrays, 45<br />

GenePix Pro 3.0 software program, 280–281<br />

GeneScan program, 356<br />

Genetic algorithm-based optimal region selection<br />

method, 387–388. See also Proteomic mass<br />

spectroscopy<br />

gp96, tumor rejection antigen, 169<br />

GRANTA-519, 308<br />

HCC. See Hepatocellular carcinoma<br />

HCL. See Hierarchical clustering<br />

Hematoxylin and eosin (H&E) staining, tissue<br />

sample collection, 44, 47–48<br />

Hepatitis B/C virus (HBV/HCV), 194<br />

Hepatocellular carcinoma, 8, 11, 59, 67, 163,<br />

170, 193<br />

qualitative and quantitative proteomic analysis of<br />

cICAT labeling technology, 195, 197, 200–201<br />

2DE-MALDI-TOF-MS assay, 194<br />

2D-LC-MS/MS for, 195–197, 201–202<br />

ExPASy proteomics tools, 202<br />

LCM for, 194–196, 199<br />

nonenzymatic method (NESP), 196, 198–199<br />

toludine blue removal and protein mixture<br />

digestion, 197, 199–200<br />

HERMeS software package, PCA and, 306<br />

HER-2/neu oncogene, 85–86, 163<br />

Hierarchical clustering, 259, 299. See also Cluster<br />

analysis techniques<br />

High performance liquid chromatography, 169, 171,<br />

183, 212–214<br />

Horseradish peroxidase (HRP), 267<br />

HPLC. See High performance liquid<br />

chromatography<br />

HSP27 protein, 103<br />

HT-29, COX-2 expressing colon cancer cell<br />

line, 183<br />

Human Proteome Organization, 143<br />

Hydrogels, 271. See also Antibody arrays<br />

ICAT labeling. See Isotope-coded affinity tag<br />

labeling<br />

IMAC-Cu 2+ ProteinChips, 134, 136<br />

Image analysis. See also 2D-PAGE maps analysis<br />

by fuzzy logic principles<br />

image defuzzyfication, 312<br />

image digitalization, 311–312<br />

multi-dimensional scaling (MDS), 315–317<br />

PCA and classification methods, 315<br />

refuzzyfication, 312–313<br />

moment functions, 317<br />

Legendre moments, 318–319<br />

Image Master Platinum software, 339, 341<br />

Immobilized pH gradient strip. See also<br />

Two-dimensional electrophoresis (2DE)<br />

isoelectric focusing (IEF) with, 60, 65<br />

rehydration of, 64–65<br />

Immunofluorescence staining, 235<br />

InterPro, 352, 361<br />

Iodoacetamide (IAA), 68<br />

IPG strip. See Immobilized pH gradient strip<br />

Isotope-coded affinity tag labeling, 78, 195<br />

mass spectrometry (MS) and, 181<br />

celecoxib, cyclooxygenase-2 (COX-2)<br />

and, 183<br />

cell culture and harvest, 183, 186<br />

cell lysis, desalting, and protein quantitation,<br />

184–187<br />

cleavable reagents, 182, 185, 187–188<br />

cleaving biotin, 186, 189<br />

labeled peptides purification, 185–186,<br />

188–189<br />

proteins, denaturation and reduction of,<br />

185, 187<br />

quantitative proteomic analysis and, 184<br />

Java Runtime Environment, 370. See also<br />

msInspect, for LC-MS data analysis<br />

KMC (K-Means/K-Medians Clustering), 259<br />

Kolmogorov–Smirnov test, 335, 339, 341<br />

Kruskal–Wallis test, 335<br />

Laser-capture microdissection, 8, 44–45, 160. See<br />

also Tissue sample collection, for proteomics<br />

analysis<br />

AutoPix TM ,48<br />

cells analysis, by 2-D GE, 77<br />

HER-2/neu positive and -negative breast<br />

tumors, 87–88<br />

isoelectric focusing (IEF), 79–80, 83–84


400 Index<br />

MASCOT search engine, 87<br />

paraffin-embedded sections staining, 81–82<br />

protein sample preparation, 79, 82–83<br />

SDS-PAGE, 79–80, 84–85<br />

silver staining and image analysis, 80, 85–86<br />

tissue block and tissue section preparation,<br />

78–79, 81<br />

trypsin digestion and MS analysis, 80, 86–87<br />

development, 161<br />

different labeling techniques and, 170<br />

DIGE and, 163–170<br />

and 2-D GE, 162–163<br />

gel-free mass spectrometry and, 171–172<br />

for HCC and non-HCC hepatocytes isolation,<br />

194–195, 199<br />

LCM lysate, 49–50<br />

and mass spectrometry analysis, 172–174<br />

PixCell II instrument, 48–49, 161<br />

and protein chip technology, 172<br />

separation methods and, 171<br />

for tissue sample collection, 44–45<br />

Veritas TM ,48<br />

Laser microdissection and pressure catapulting, 8<br />

LC-ESI-MS/MS. See Liquid<br />

chromatography-electrospray ionization<br />

tandem mass spectrometry<br />

LCM. See Laser-capture microdissection<br />

LC-MS data. See Liquid chromatography-mass<br />

spectrometry data<br />

LC-MS/MS. See Liquid chromatography-tandem<br />

mass spectrometry<br />

LDA. See Linear Discriminant Analysis<br />

Legendre moments, 317–319<br />

Levene’s test, 334<br />

Linear Discriminant Analysis, 300–301,<br />

315–316<br />

Liquid chromatography-mass spectrometry data,<br />

370, 374–376, 377<br />

Liquid chromatography-mass spectrometry data<br />

analysis, msInspect for, 369<br />

data viewing and navigation, 371–373<br />

locating peptides in, 373–376<br />

low-quality peptides, elimination of, 376<br />

peptide quantitation, 376–378<br />

software installation for, 370<br />

Liquid chromatography-tandem mass spectrometry,<br />

170, 171<br />

label-free, for biomarker identification, 209–210<br />

albumin/IgG depletion, 211–213<br />

chromatographic alignment, 218–221<br />

data transformation and normalization, 222<br />

HPLC, 212–214<br />

mass spectrometer, 212, 214<br />

MS/MS spectral filtering, 216–217<br />

peptide identification, 217–218<br />

peptide quantification, 221–222<br />

statistical analysis, 223<br />

zoom scan data processing, 214–216<br />

LMPC. See Laser microdissection and pressure<br />

catapulting<br />

two-dimensional (2D-LC/MS/MS), 78<br />

Lysine labeling, 169<br />

MALDI/SELDI protein profiling, of serum,<br />

125–126<br />

on MALDI-TOF–TOF<br />

data collection, 131–132<br />

MB fractionation, of human serum, 131<br />

protein identification by, 132–133<br />

MB-based fractionation, 127, 128, 131<br />

SELDI and MALDI spectra acquisition, 129<br />

SELDI ProteinChip, 130<br />

(Magnetic bead based)<br />

on SELDI-TOF, 133<br />

ProteinChip arrays, 134–135<br />

SPA matrix addition, 135<br />

spectra collection on, 135–138<br />

MALDI-TOF-MS. See Matrix-assisted laser<br />

desorption time of flight mass spectrometry<br />

MALDI-TOF, peptide mass fingerprinting (PMF)<br />

and, 62, 71<br />

MALDI-TOF–TOF, serum protein profiling on<br />

data collection, 131–132<br />

MB fractionation, of human serum, 131<br />

protein identification by, 132–133<br />

Maleimide labeling, of cysteine<br />

sulfhydryls, 96<br />

MARS. See Multiple affinity removal system<br />

MASCOT software, 81, 87–88<br />

Mass spectrometry, 58–59, 214<br />

ICAT labeling and, 181<br />

celecoxib, cyclooxygenase-2 (COX-2)<br />

and, 183<br />

cell culture and harvest, 183, 186<br />

cell lysis, desalting, and protein quantitation,<br />

184–187<br />

cleavable reagents, 182, 185,<br />

187–188<br />

cleaving biotin, 186, 189<br />

labeled peptides purification, 185–186,<br />

188–189<br />

proteins, denaturation and reduction of,<br />

185, 187<br />

quantitative proteomic analysis and, 184<br />

LCM and, 172–174


Index 401<br />

Matrix-assisted laser desorption time of flight mass<br />

spectrometry, 125–126, 142, 163, 194<br />

LCM and, 171<br />

for urine protein profiling. See Urine protein<br />

profiling, by 2DE and MALDI-TOF-MS<br />

MAVER-1 cell lines, 308<br />

MC3T3-E1, osteoblast cell line, 233, 236–237, 239<br />

MDS technique. See Multi-dimensional scaling<br />

techniques<br />

MeOH/CHCl 3 protocol, 106<br />

Metalloproteins, 350<br />

MicroSol-IEF, ZOOM ® , 60, 65–66<br />

Miniaturized parallelized sandwich immunoassays.<br />

See Suspension antibody microarrays<br />

MS. See Mass spectrometry<br />

MS-Fit software, 81<br />

msInspect, for LC-MS data analysis, 369<br />

data viewing and navigation, 371–373<br />

locating peptides in, 373–376<br />

low-quality peptides, elimination, 376<br />

peptide quantitation, 376–378<br />

software installation for, 370<br />

MS/MS spectral filtering, 216–217<br />

Multi-dimensional scaling techniques, 313, 315–317<br />

MultiExperiment <strong>View</strong>er (MeV), 259<br />

Multiple affinity removal system, 59, 63–64<br />

Multiplexed bead-based flow-cytometry assays, 266<br />

Nanoflow reversed-phase LC-tandem mass<br />

spectrometry (nanoRPLC-MS/MS), 233, 235,<br />

238–239<br />

Non-enzymatic sample preparation (NESP), 194,<br />

196, 198–199<br />

One-antibody label-based assays, 264–266<br />

One-dimensional liquid chromatography coupled<br />

with tandem mass spectrometry<br />

(1D-LC-MS/MS), 201–202. See also<br />

Hepatocellular carcinoma<br />

16 O/ 18 O isotopic labeling, 78<br />

Osteoblasts, 232. See also Extracellular matrix<br />

MC3T3-E1, 233, 236–237, 239<br />

2D-PAGE maps analysis, 291<br />

dedicated software packages and, 292–294<br />

image analysis<br />

fuzzy logic, 311–317<br />

moment functions, 317–319<br />

spot volume datasets, analysis of, 294<br />

cluster analysis, 297–299<br />

DA-PLS method, 309–311<br />

linear discriminant analysis, 300–301<br />

pattern recognition methods, 306–309<br />

PLS regression and DA-PLS regression, 306<br />

principal component analysis, 294–297<br />

SIMCA method, 301–305<br />

PALM microlaser dissector, 161<br />

Parkinson’s disease, 310<br />

Partial least squares regression, 306, 308, 338<br />

Pattern recognition methods<br />

cluster analysis. See Cluster analysis techniques<br />

PCA. See Principle component analysis<br />

proteomic mass spectroscopy and. See Proteomic<br />

mass spectroscopy<br />

SIMCA classification. See Soft-independent<br />

model of class analogy method<br />

PCA. See Principle component analysis<br />

PCa-24 protein, in epithelial cells, 172<br />

PDB. See Protein data bank<br />

PDQuest system, 293, 308<br />

Peptide mass fingerprinting, MALDI-TOF and,<br />

62, 71<br />

Peptide/protein separation system, 171<br />

PerkinElmer scanners, 281<br />

Pfam, 352, 360<br />

PIN. See Prostatic intraepithelial neoplasia<br />

PIVKA-II, 194<br />

PixCell II system, 48–49, 77, 82–83, 161. See also<br />

Laser-capture microdissection<br />

Planar antibody arrays, 248, 264. See also Antibody<br />

arrays<br />

main formats of, 265<br />

types of, labeling-hybridization methods and,<br />

266–268<br />

10plex soluble receptor assay, 255–256, 258. See<br />

also Bead-based multiplex assays<br />

PLS regression. See Partial least squares regression<br />

PMF. See Peptide mass fingerprinting<br />

PMS. See Proteomic mass spectroscopy<br />

Position-specific scoring matrix, 361<br />

Post-translational modification (PTM) profiling, on<br />

selected spots, 71–72<br />

Principle component analysis, 101, 259, 294–297,<br />

308, 315–316, 343. See also 2D-PAGE maps<br />

analysis<br />

Escherichia coli, 307<br />

for explorative data analysis, 336–338<br />

in HERMeS software package, 306<br />

U937 human lymphoma cell line and, 307<br />

Prostatic intraepithelial neoplasia, 44<br />

Protein chip technology and LCM, 172<br />

Protein data bank, 352, 360–361<br />

Protein precipitation, 143–144


402 Index<br />

Protein profiling of human plasma samples , by<br />

two-dimensional electrophoresis, 57<br />

coomassie brilliant blue G-250 staining, 68<br />

destaining, in-gel deglycosylation and in-gel<br />

tryptic digestion, 61–62, 69<br />

2D gels preparation and analysis, 61, 67–69<br />

difference in gel electrophoresis (DIGE)<br />

system, 59<br />

free flow electrophoresis (FFE), samples<br />

fractionation by, 60–61, 67<br />

high-abundance proteins depletion, by<br />

immunoaffinity column, 59, 63–64<br />

HPPP, 58<br />

IPG gel strip rehydration, 64–65<br />

isoelectric focusing (IEF), with IPG strip,<br />

60, 65<br />

MALDI plating and peptides desalting,<br />

62, 69–71<br />

mass spectrometry (MS), 58–59<br />

microscale solution isoelectric focusing,<br />

ZOOM ® , 60, 65–66<br />

peptide mass fingerprinting, MALDI-TOF and,<br />

62, 71<br />

PTMs profiling, on selected spots,<br />

71–72<br />

samples preparation, 59, 62<br />

TCA/acetone precipitation, 64<br />

Proteomic data, statistical analysis, 327<br />

classical dyes, 339–342<br />

confirmatory univariate data analysis, 333–335<br />

DIGE approach, 342–345<br />

experimental design for, 328<br />

data processing, 330–333<br />

pooling, 330<br />

replicates, 329–330<br />

exploratory multivariate data analysis, 335<br />

marker selection, 338–339<br />

principal component analysis, 336–338<br />

Proteomic mass spectroscopy, 383<br />

statistical classification strategy (SCS) for<br />

classifier aggregation, 390<br />

data visualization, 384–385<br />

feature selection/extraction (FSE), 386–388<br />

preprocessing, 385–386<br />

robust classifier development, 388–390<br />

Proteomics analysis, for tissue sample collection<br />

formalin fixation, 43–44<br />

hematoxylin staining, 47–48<br />

immunocapture procedure, 46<br />

immunofluorescence staining, 48<br />

laser-capture microdissection (LCM), 44–45<br />

AutoPix TM ,48<br />

PixCell II instrument, 48–49<br />

Veritas TM ,48<br />

LCM lysate, 49–50<br />

SELDI-TOF-MS, 46<br />

PSSM. See Position-specific scoring matrix<br />

QTC (QT CLUST), 260<br />

Resonance light scattering (RLS), 268<br />

Reverse protein arrays, 268<br />

Rolling-circle amplification (RCA), 268<br />

SCX-LC. See Strong cation exchange liquid<br />

chromatography<br />

SDS-PAGE. See Sodium dodecyl<br />

sulfate-polyacrylamide gel electrophoresis<br />

SELDI. See Surface-enhanced laser<br />

desorption/ionization<br />

SELDI-TOF. See Surface-enhanced laser<br />

desorption/ionization time-of-flight<br />

Self Organizing Maps (SOM), 259<br />

Self Organizing Tree Algorithm (SOTA), 259<br />

Shapiro-Wilk test, 334, 339<br />

Significance Analysis of Microarrays (SAM), 259<br />

Silver staining, 80, 332–333. See also Laser-capture<br />

microdissection<br />

and image analysis, 85–86<br />

SIMCA method. See Soft-independent model of<br />

class analogy method<br />

SKBR-3, breast cancer cell line, 171<br />

Sodium dodecyl sulfate-polyacrylamide gel<br />

electrophoresis, 84–85, 94, 96,<br />

104, 110–111<br />

isoelectric focusing (IEF) and, 79–80<br />

PROTEAN II xi Cell system (Bio-Rad) for, 84<br />

Soft-independent model of class analogy method,<br />

301–305, 307–308<br />

Streptavidin-R-Phycoerythrin (SAPE), 267<br />

Strong cation exchange liquid chromatography,<br />

234–235, 238<br />

Strong cation exchange liquid chromatography, of<br />

peptides, 233, 234–235, 238<br />

Student’s T-test, 334<br />

2-(4-Sulfophenylazo)-1,8-dihydroxy-3,6-<br />

naphthalenedisulfonic acid (SPADNS), 60, 67<br />

Support vector machines, 388–389. See also<br />

Proteomic mass spectroscopy<br />

Surface-enhanced laser desorption/ionization, 9, 13,<br />

125–126, 142, 172, 194<br />

serum protein profiling on, 133<br />

ProteinChip arrays, 134–135<br />

SPA matrix addition, 135<br />

spectra collection on, 135–138


Index 403<br />

Suspension antibody microarrays, 247–248<br />

bead-based multiplex assays processing,<br />

252–256<br />

limit of detection (LOD), 257<br />

miniaturized multiplexed protein assays,<br />

analytical performance, 256–259<br />

pattern generation, 259–260<br />

principle of, 249<br />

production, coupling to carboxylated<br />

microspheres, 249–252<br />

SVMs. See Support vector machines<br />

TAAs arrays. See Tumor-associated antigen arrays<br />

TCA/acetone precipitation, 2DE and, 64<br />

Tissue sample collection, for proteomics analysis<br />

formalin fixation, 43–44<br />

hematoxylin staining, 47–48<br />

immunocapture procedure, 46<br />

immunofluorescence staining, 48<br />

laser-capture microdissection (LCM), 44–45<br />

AutoPix TM ,48<br />

PixCell II instrument, 48–49<br />

Veritas TM ,48<br />

LCM lysate, 49–50<br />

SELDI-TOF-MS, 46<br />

Tributylphosphine (TBP), 68<br />

Trichloroacetic acid (TCA) precipitation, 143–144,<br />

146–147, 151<br />

Trifluoroacetic acid (TFA), 182<br />

Tris buffer, 277<br />

TTEST (T-tests), 259<br />

Tumor-associated antigen arrays, 266, 269<br />

Two-dimensional electrophoresis (2DE), 11,<br />

194, 328<br />

biological replicates, 329–330<br />

LCM and, 162–163<br />

for protein profiling of human plasma<br />

samples, 57<br />

coomassie brilliant blue G-250 staining, 68<br />

destaining, in-gel deglycosylation and in-gel<br />

tryptic digestion, 61–62, 69<br />

2D gels preparation and analysis, 61, 67–69<br />

difference in gel electrophoresis (DIGE)<br />

system, 59<br />

free flow electrophoresis (FFE), samples<br />

fractionation by, 60–61, 67<br />

high-abundance proteins depletion, by<br />

immunoaffinity column, 59, 63–64<br />

HPPP, 58<br />

IPG gel strip rehydration, 64–65<br />

isoelectric focusing (IEF), with IPG strip,<br />

60, 65<br />

MALDI plating and peptides desalting, 62,<br />

69–71<br />

mass spectrometry (MS), 58–59<br />

microscale solution isoelectric focusing,<br />

ZOOM ® , 60, 65–66<br />

peptide mass fingerprinting, MALDI-TOF and,<br />

62, 71<br />

PTMs profiling, on selected spots, 71–72<br />

samples preparation, 59, 62<br />

TCA/acetone precipitation, 64<br />

technical replicates, 329–330<br />

for urine protein profiling. See Urine protein<br />

profiling, by 2DE and MALDI-TOF-MS<br />

Two-dimensional fluorescence difference gel<br />

electrophoresis (2-D DIGE), 78 see also<br />

Difference Gel electrophoresis (DIGE)<br />

technology<br />

Two-dimensional liquid chromatography tandem<br />

mass spectrometry (2D-LC-MS/MS), 78, 170<br />

see also liquid chromatography tandem mass<br />

spectrometry<br />

for HCC and non-HCC hepatocytes isolation,<br />

195–197, 201–202<br />

Two-dimensional polyacrylamide gel<br />

electrophoresis (2D PAGE),<br />

162–163, 174 see also 2D gel electrophoresis,<br />

2D gels<br />

Two-factor ANOVA (TFA), 259<br />

Ultrafiltration technique, 144<br />

Urine protein profiling, by 2DE and<br />

MALDI-TOF-MS, 141–142<br />

analytical/profiling techniques, 145–146<br />

organic solvent precipitation protocol, 145,<br />

147–148<br />

protein precipitation, 143–144<br />

TCA/acetone precipitation protocol, 145–147<br />

ultrafiltration-SPE, 144–145, 148–149<br />

urine SPE, 149<br />

Veritas TM , 48. See also Laser-capture<br />

microdissection<br />

Web-based tools, for protein classification, 349<br />

BLAST, 352, 358<br />

dot-plot style alignment, of protein sequence,<br />

358–359<br />

EnsEmbl, 352, 356<br />

evolution-based classification schemes, 351<br />

ExPASy, 352<br />

expressed sequence tags (ESTs), 357<br />

GeneScan program, 356


404 Index<br />

InterPro, 352, 361<br />

MEROPS, 361<br />

metalloproteins, 350<br />

PDB, 352, 360–361<br />

Pfam, 352, 360<br />

PRINTS, 361<br />

PROSITE, 361<br />

sequence and structure of proteins and, 352–356<br />

SMART, 360<br />

Western blotting protocols, 275<br />

XIC. See Extracted ion chromatogram<br />

ZOOM ® , MicroSol-IEF, 60, 65–66<br />

Zoom scan triple-play experiment, 214

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!