13.07.2015 Views

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

242 JOSHI-TOPE ET AL.and co-immunoprecipitation studies. Because they containthe cutting-edge, unverified information that GK explicitlyexcludes, they are entirely complementary to GK.In the near future, we hope to establish reciprocal linkswith BIND, AfCS, and other such resources.As noted earlier, GK has deliberately limited its scopein several areas in order to make the project manageable.One <strong>of</strong> the more controversial <strong>of</strong> these decisions was toomit tissue-specific expression information from the description<strong>of</strong> proteins. In the GK world, all proteins are expressedin all cells <strong>of</strong> the human body. This decisionmakes it difficult to express, for example, differences inthe glycolytic pathways in liver and skeletal muscle, andultimately interferes with the ability <strong>of</strong> pathway traversalalgorithms to correctly detect impossible pathways. Ourreasoning is that not too far into the future there will beonline atlases <strong>of</strong> human tissue-specific protein expressionpatterns derived from oligonucleotide arrays and otherhigh-throughput technologies. This information willclose the gap in GK’s data set. In the meantime, we areusing a variety <strong>of</strong> workarounds to describe tissue-specificdifferences in GK pathways.How far has GK gone toward our ambitious goal <strong>of</strong> coveringall essential topics in human biology? It has beenroughly one year since GK went from development modeinto full production. During that time, we have releasedten modules spanning such topics as DNA replication,mRNA processing, RNA transcription, cell cycle checkpoints,and a fair chunk <strong>of</strong> intermediate metabolism.Roughly speaking, we complete a new module eachmonth, although the size and scope <strong>of</strong> each module variestremendously depending on its content.One crude measure is to count the number <strong>of</strong> SwissProtand SwissProt/TREMBL proteins that we have accessionedin GK and then to divide by the total number <strong>of</strong>human proteins in SwissProt and SwissProt/TREMBL.By this measure, GK covers slightly more than 6% <strong>of</strong> theinformation space. However, this procedure is biaseddownward because it fails to account for the fact that aconsiderable number <strong>of</strong> the proteins in TREMBL comefrom gene predictions and have no known function. Abetter measure might involve estimating the number <strong>of</strong>reactions described in medical textbooks or the review literature.We are currently debating how best to accomplishthis.Looking ahead, our biggest priority for the next year <strong>of</strong>the project is to develop more tools for mining and visualizingGK data. <strong>The</strong> database is only as good as the toolsit <strong>of</strong>fers to bioinformaticists and other researchers for organizingand interpreting large-scale data sets. One proposedvisualization tool is a zoomable “starry sky” view<strong>of</strong> the human physiome in which each protein occupies afixed position in the night sky and connecting lines definethe “constellation” pathways. By superimposing microarrayexpression data on this view, researchers could see ata glance which pathways are involved by a set <strong>of</strong> up- ordown-regulated genes.A second priority is to speed the authoring and curationprocess. As noted above, we have developed and arenow testing an authoring tool that allows authors to interactdirectly with the GK database. If this tool is successful,we will no longer have to rely on PowerPoint templates as the primary authoring tool. This will free upcurators from the task <strong>of</strong> transferring information fromPowerPoint into Protégé and allow us to recruit moreauthors.Finally, we need to bring more automation into the curatorialprocess. GK curators spend a considerableamount <strong>of</strong> time looking up literature references, matchinggene names to SwissProt accession numbers, and checkingthe descriptions <strong>of</strong> protein functions against Gene Ontologyterms. This process would be greatly aided by a set<strong>of</strong> s<strong>of</strong>tware tools to generate a “pre-authored” documentthat would contain proposed Gene Ontology terms, proteinaccession numbers, and literature references for thetopic currently under review. Over the next year, we willexplore a number <strong>of</strong> approaches to such pre-authoringtools.ACKNOWLEDGMENTS<strong>The</strong> development <strong>of</strong> <strong>Genom</strong>e KnowledgeBase is supportedby grant R01 HG-002639 from the National Human<strong>Genom</strong>e Research Institute at the National Institutes<strong>of</strong> Health and a subcontract from the NIH-funded CellMigration Consortium.REFERENCESAshburner M., Ball C.A., Blake J.A., Botstein D., Butler H.,Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., EppigJ.T., Harris M.A., Hill D.P., Issel-Tarver L., Kasarskis A.,Lewis S., Matese J.C., Richardson J.E., Ringwald M., RubinG.M., and Sherlock G. (<strong>The</strong> Gene Ontology Consortium).2000. Gene ontology: Tool for the unification <strong>of</strong> biology. Nat.Genet. 25: 25.Bader G.D.and Hogue C.W. 2000. BIND: A data specificationfor storing and describing biomolecular interactions, molecularcomplexes and pathways. Bioinformatics 16: 465.Bader G.D., Donaldson I., Wolting C., Ouellette B.F., PawsonT., and Hogue C.W. 2001. BIND: <strong>The</strong> Biomolecular InteractionNetwork Database. Nucleic Acids Res. 29: 242.Costanzo M.C., Hogan J.D., Cusick M.E., Davis B.P., FancherA.M., Hodges P.E., Kondu P., Lengieza C., Lew-Smith J.E.,Lingner C., Roberg-Perez K.J., Tillberg M., Brooks J.E., andGarrels J.I. 2000. <strong>The</strong> yeast proteome database (YPD) andCaenorhabditis elegans proteome database (WormPD):Comprehensive resources for the organization and comparison<strong>of</strong> model organism protein information. Nucleic AcidsRes. 28: 73.DeRisi J.L., Iyer V.R., and Brown P.O. 1997. Exploring themetabolic and genetic control <strong>of</strong> gene expression on a genomicscale. Science 278: 680.DuBois P. 2003. MySQL cookbook. O’Reilly and Associates,Sebastopol, California.Fields S. and Song O. 1989. A novel genetic system to detectprotein-protein interactions. Nature 340: 245.Fleischmann R.D., Adams M.D., White O., Clayton R.A., KirknessE.F., Kerlavage A.R., Bult C.J., Tomb J.F., DoughertyB.A., Merrick J.M., McKenney K., Sutton G.G., FitzHughW., Fields C.A., Gocayne J.D., Scott J.D., Shirley R., Liu L.I.,Glodek A., Kelley J.M., Weidman J.F., Phillips C.A., SpriggsT., Hedblom E., and Cotton M.D., et al. 1995. Whole-genomerandom sequencing and assembly <strong>of</strong> Haemophilus influenzaeRd. Science 269: 496.Kanehisa M. and Goto S. 2000. KEGG: Kyoto encyclopedia <strong>of</strong>genes and genomes. Nucleic Acids Res. 28: 27.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!