13.07.2015 Views

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>The</strong> <strong>Genom</strong>e Knowledgebase: A Resource for Biologistsand BioinformaticistsG. JOSHI-TOPE,* I. VASTRIK, † G.R. GOPINATH,* L. MATTHEWS,* E. SCHMIDT, †M. GILLESPIE, ‡ P. D’EUSTACHIO, B. JASSAL, † S. LEWIS, § G. WU,* E. BIRNEY, † AND L. STEIN* †*Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11790; † European Bioinformatics Institute,Hinxton Outstation, Hinxton, Cambridge, United Kingdom; ‡ Department <strong>of</strong> Biology, St. Johns University,Queens, New York 11439; New York University School <strong>of</strong> Medicine, New York, New York 10016;§ Berkeley Drosophila <strong>Genom</strong>e Project, University <strong>of</strong> California, Berkeley, California 94720Biological science now has access to the sequencedgenomes <strong>of</strong> dozens <strong>of</strong> organisms spanning the phylogeneticgamut from prokaryotes (Fleischmann et al. 1995)to people (Lander et al. 2001). We have reasonable estimateson the number and nature <strong>of</strong> most <strong>of</strong> the proteincodinggenes, a fact that has radically changed the nature<strong>of</strong> gene hunting from an activity that is primarily done atthe bench to one that is mostly done using the computer.Our knowledge <strong>of</strong> the genome contents is augmented byhigh-throughput experimental techniques such as expressionchips (DeRisi et al. 1997) and yeast two-hybrid studies(Fields and Song 1989) for probing protein interactionsand regulatory networks.<strong>The</strong> amount <strong>of</strong> genomic information is rising exponentially.To manage this onslaught, the bioinformatics communityhas created a series <strong>of</strong> gene-centric “catalogs”such as RefSeq (Pruitt et al. 2000; Pruitt and Maglott2001), GeneCards (Safran et al. 2002), KEGG (Kanehisaand Goto 2000), and YPD (Constanzo et al. 2000). <strong>The</strong>sedatabases provide a gene-by-gene view <strong>of</strong> the genome,presenting all the information known about the structureand function <strong>of</strong> a gene and its protein product(s) in a singlerecord. <strong>The</strong>se gene catalog efforts have been greatlyaided by the Gene Ontology (GO; Ashburner et al. 2000),which provides a detailed controlled vocabulary for describingthe subcellular location, molecular function, andassociated biological process <strong>of</strong> a gene product.However, biological researchers rarely think a gene ata time. Instead, they are concerned with the complex interactionsamong proteins, protein complexes, nucleicacids, and small molecules that carry out a complex biologicalprocess. A great preponderance <strong>of</strong> papers in thefield <strong>of</strong> molecular biology deal with dissecting the choreographedinteractions between ensembles <strong>of</strong> macromolecules,or with the function <strong>of</strong> a particular gene productwithin the larger context <strong>of</strong> a biological pathway.Hence, there is a mismatch between the “gene at a time”design <strong>of</strong> the current generation <strong>of</strong> genome databases,and the whole-pathway approach <strong>of</strong> much <strong>of</strong> the scientificliterature. <strong>The</strong> online gene catalogs cannot expressthe pathway concepts embodied in the literature, and reciprocally,the research community can only obtain fragmentaryand sometimes misleading information aboutpathways from the gene catalogs.To narrow this gap, we created the <strong>Genom</strong>e Knowledgebase(GK; http://www.genomeknowledge.org), anopen, online database <strong>of</strong> fundamental human biologicalprocesses. This paper describes GK, and the lessons wehave learned in the two years since we first began the project.INTENDED AUDIENCEGK has two target audiences. One is the wet-bench researcherwho has stumbled onto an unfamiliar gene productand wants to get a quick overview <strong>of</strong> what the productis and what it does. <strong>The</strong> GK Web site allows such a researcherto quickly locate a gene product, to find the reactionsand processes it participates in, and to learn aboutthe role the gene product plays in the larger context <strong>of</strong> abiological pathway. Included within this target audienceare undergraduates, graduate students, and postdocs, whocan use GK as a “Cliff Notes ” <strong>of</strong> biological pathways,to rapidly bring themselves up to speed on the foundations<strong>of</strong> a biological process and its core literature. To thistarget audience, GK appears as a Web-based publication,an online textbook <strong>of</strong> molecular biology.<strong>The</strong> second target audience is the bioinformaticist whois trying to draw conclusions from a large data set like aseries <strong>of</strong> spotted cDNA expression chip experiments. Tothis type <strong>of</strong> researcher, GK appears as a database that canbe queried interactively or downloaded in bulk form. <strong>The</strong>pathway information contained in GK can be superimposedon top <strong>of</strong> the experimental data set, revealing informationabout pathways already known and suggestingnew relationships.THE GK WEB SITE<strong>The</strong> front page <strong>of</strong> the GK Web site is shown in Figure1. From this perspective, GK appears like a top-downtextbook <strong>of</strong> biological processes. <strong>The</strong> knowledge containedwithin GK is organized into modules such as“DNA Replication.” Each module has one or more primaryauthors who are typically experts recruited from theresearch community, an editor who is a full-time member<strong>of</strong> the GK staff, and one or more peer reviewers who arealso recruited from the community. A module has an ini-Cold Spring Harbor Symposia on Quantitative Biology, Volume LXVIII. © 2003 Cold Spring Harbor Laboratory Press 0-87969-709-1/04. 237

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!