The Genom of Homo sapiens.pdf
The Genom of Homo sapiens.pdf
The Genom of Homo sapiens.pdf
- TAGS
- homo
- www.yumpu.com
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Annotation <strong>of</strong> Novel Proteins Utilizing A Functional<strong>Genom</strong>e Shotgun Coupled with High-ThroughputProtein Interaction MappingJ.A. MALEK,* J.M. WIERZBOWSKI,* G.A. DASCH, † M.E. EREMEVA, ‡P.J. MCEWAN,* AND K.J. MCKERNAN**Agencourt Bioscience Corporation, Beverly, Massachusetts 01915; † Centers for Disease Control and Prevention,Atlanta, Georgia 30332; and ‡ University <strong>of</strong> Maryland, Baltimore, Maryland 21201It is quoted frequently that the amount <strong>of</strong> genomic sequencedata is increasing at a tremendous rate while functionalmethods for studying proteins have not kept pace.Many groups have attempted to address this issue by use<strong>of</strong> microarrays, yeast two-hybrid screens, and proteincomplex purification with subsequent identification. <strong>The</strong>underlying theme <strong>of</strong> these approaches is their use <strong>of</strong>“guilt-by-association” (Oliver 2000) methods for annotation<strong>of</strong> proteins <strong>of</strong> unknown function. <strong>The</strong> association <strong>of</strong>a protein <strong>of</strong> unknown function with proteins <strong>of</strong> knownfunction is used to derive a potential function for the protein<strong>of</strong> unknown function. Although large-scale microarrayexperiments have increased dramatically and are carriedout in numerous large and small laboratories,proteome-wide two-hybrid experiments have only beencarried out on yeast (Uetz et al. 2000; Ito et al. 2001) withsome large studies in Caenorhabditis elegans (Walhoutet al. 2000), and Helicobacter pylori (Rain et al. 2001),among others. Numerous review papers have been writtenon these large-scale two-hybrid studies, analyzing thedata, testing the data’s validity, and using the data to trainin silico protein interaction prediction s<strong>of</strong>tware. <strong>The</strong> needfor further, validated, protein interaction information isclear. Among the challenges in generating proteomewideinteraction data are the lack <strong>of</strong> fully automated processes,the sheer amount <strong>of</strong> screening necessary to completeone map for one organism, and an incomplete grasp<strong>of</strong> what constitutes a true, physiologically important proteininteraction. It is our belief that using comparative interactiondata will allow deciphering <strong>of</strong> what interactionsare physiologically valid. Validation <strong>of</strong> interactions hascentered around comparisons to databases <strong>of</strong> individuallyobtained and presumably more verified interactions, thepresence <strong>of</strong> interactions among proteins with similar expressionpr<strong>of</strong>iles, and the frequency <strong>of</strong> interactionsamong proteins sharing similar biological processesand/or cellular compartments (Deane et al. 2002). Althoughthese methods <strong>of</strong> verification may add a level <strong>of</strong>significance to any interaction, their absence should notper se be used to subtract from an interaction’s validity.Observing similar expression pr<strong>of</strong>iles between two proteinsmay suggest they are functionally related but doesnot mean that they physically interact. Observing interactionsamong proteins <strong>of</strong> different biological processesmay reveal a gap in our knowledge more than an incorrectinteraction. Observation <strong>of</strong> interactions among proteinsfrom different cellular compartments is less meaningfulin organelle-free microbes. Physiologically significantinteractions with a wide range <strong>of</strong> strengths have been observed,therefore a significance cut<strong>of</strong>f based on interactionstrength cannot be set at present.Automated DNA sequencing technology was heavilydeveloped during the Human <strong>Genom</strong>e Project into a robustand cheap process. It would be <strong>of</strong> benefit to use thesedevelopments in advancing proteome-wide interactiondata. We have attempted to improve the ease with whichsuch studies can be carried out by adopting a strategy thatrelies on whole-genome shotgun sequencing and a bacterialtwo-hybrid system (Dove et al. 1997; Shaywitz et al.2000). <strong>The</strong> whole-genome shotgun method generatescloned overlapping fragments <strong>of</strong> genomic DNA which, ifcloned in the proper orientation and frame, can be expressedas a protein. <strong>The</strong> use <strong>of</strong> peptide fragments ratherthan full-length proteins has been shown to reduce falsenegatives (Ward et al. 2002) while <strong>of</strong>fering the opportunityto localize the domain <strong>of</strong> a protein responsible for aninteraction. Use <strong>of</strong> the bacterial two-hybrid system allowsintegration into standard sequencing pipelines. <strong>The</strong> twovectors used in the system are standard sequencing vectorsthat are transformed together into an essentially standardcloning strain <strong>of</strong> Escherichia coli. <strong>The</strong> system, similarto various yeast two-hybrid systems, relies onrecruitment <strong>of</strong> transcriptional machinery to promoters upstream<strong>of</strong> reporter genes. Briefly, a protein <strong>of</strong> interest isfused to the λcI protein which binds a λ operator on thereporter construct (Fig. 1a). A second protein <strong>of</strong> interestis fused to the RNA polymerase α-subunit. An interactionbetween the proteins <strong>of</strong> interest stabilizes the transcriptionalmachinery at a weak promoter upstream <strong>of</strong> the reporterconstruct (Fig. 1b). Interactions are observed as acolony able to grow in the presence <strong>of</strong> an antibiotic andthe absence <strong>of</strong> any carbon source other than lactose.Colonies can enter a standard sequencing pipeline at thispoint through the automated colony pickers. Sequencing<strong>of</strong> the bait <strong>of</strong> prey fragment is conducted with primersspecific for either vector.Cold Spring Harbor Symposia on Quantitative Biology, Volume LXVIII. © 2003 Cold Spring Harbor Laboratory Press 0-87969-709-1/04. 331